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TITLE OF THE INVENTION 

GENES OF CAROTENOID BIOSYNTHESIS AND METABOLISM 
AND A SYSTEM FOR SCREENING FOR SUCH GENES 

BACKGROUND OF THE INVENTION 
Field of the Invention 

The present invention describes the DNA sequence for 
eukaryotiq genes encoding e cyclase, isopentenyl pyrophosphate 
isomerase' (IPP) and ;S-carotene hydroxylase as well as vectors 
containing the sane and hosts transformed with said vectors. 
The present invention also provides a method for augmenting 
the accumulation of carotenoids and production of novel and 
rare carotenoids. The present invention provides methods for 
controlling the ratio of various carotenoids in a host. 
Additionally, the present invention provides a method for 
screening for eukaryotic genes encoding enzymes of carotenoid 
biosynthesis and metabolism. 

Discussion bf the Background 

Carotenoid pigments with cyclic endgroups are essential 
components of the photosynthetic apparatus in oxygenic 
photosynthetic organisms (e.g., cyanobacteria, algae and 
plants; Goodwin, 1980). The symmetrical bicyclic yellow 
carotenoid pigment (S-carotene (or, in rare cases, the 
asymmetrical bicyclic a-carotene) is intimately associated 
^^th the photosynthetic reaction centers and plays a vital 
role in protecting against potentially lethal photooxidative 
damage (Koyama, 1991). ^-carotene and other carotenoids 
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derived from it: or from a-carotene also serve as light- ♦ 
harvesting pigments (Sief ermann-Harms , 1987) , are involved in 
the thermal dissipation of excess light energy captured by the. 
light-harvesting antenna (Demmig-Adams & Adams, 1992), provide 
substrate for the biosynthesis of the plant growth regulator 
abscisic acid (Rock & Zeevaart, 1991; Parry & Horgan, 1991), 
and are precursors of vitamin A in human and animal diets 
(Krinsky, 1987). Plants also exploit carotenoids as coloring 
agents in flowers and fruits to attract pollinators and agents 
of seed dispersal (Goodwin, 1980) . The color provided by 
carotenoids is also of agronomic value in a number of 
important crops. Carotenoids are currently harvested from 
plants for use as pigments in food and feed. 

The probable pathway for formation of cyclic carotenoids 
in plants, algae and cyanobacteria is illustrated in Figure 1. 
Two types of cyclic endgroups are commonly found in higher 
plant carotenoids, these are referred to as the d and e cyclic 
endgroups (Fig. 3.; the acyclic endgroup is referred to as the 
* or psi endgroup) . These cyclic endgroups differ only in the 
position of the double bond in the ring. Carotenoids with two 
/3 rings are ubiquitous, and those with one & and one c ring 
are common, but carotenoids with two € rings are rarely 
detected. /8-Carotene (Fig. i) has two /3 endgroups and is a 
symmetrical compound that is the precursor of a number of 
other important plant carotenoids such as zeaxanthin and 
violaxanthin (Fig. 2) . 
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Carotenoid enzymes have previously been isolated from'a 
variety of sources including bacteria (Armstrong et al., 1989 
Mol. Gen. Genet, 216, 254-268; Misawa et al., 1990, J. 
Bacteriol., 172, 6704-12), fungi (Schmidhauser et al.,.l99Q, 
Mol. Cell. Biol. 10, 5064-70) , cyanobacteria (Chamovitz et 
al.,..i990, Z. Naturforsch, 45c, 482-86) and higher plants 
(Hartley et al., Proc. Natl. Acad. Sci USA 88, 6532-36; 
Martinez-Ferez & Vioque, 1992, Plant Mol. Biol. 18, 981-83). 
Many of the isolated enzymes show a great diversity in 
function and inhibitory properties between sources . For 
example, phytoene desaturases from Synechococcus and higher 
plants carry out a two-step desaturation to yield f -carotene 
as a reaction product; whereas the same enzyme from ErwinxB 
introduces four double bonds forming lycopene. Similarity of 
the amino acid sequences are very low for bacterial versus 
plant enzymes. Therefore, even with a gene in hand from one 
source, it is difficult to screen for a gene with similar 
function in another source. In particular, the sequence 
similarity between prokaryotic and eukaryotic genes is quite 
low. 

Further, the mechanism of gene expression in prokaryotes 
and eukaryotes appears to differ sufficiently such that one 
can not expect that an isolated eukaryotic gene will be 
properly expressed in a prokaryotic host. 
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The difficulties in isolating related genes is 
exemplified by recent efforts to isolated the enzyme which 
catalyzes the formation of jS-carotene from the acyclic 
precursor lycopene. Although this enzyme had been isolated in 
a prokaryote, it had not been isolated from any photosynthetic 
organism nor had the corresponding genes been identified and 
sequenced or the cof actor requirements established. The 
isolation and characterization of the enzyme catalyzing 
formation of /5-carotene in the cy^nobacterium SyriGahococcus 
PCC7 94 2 was described by the present inventors and others 
(Cunningham et'al. , 1993 and 1994). 

The need remains for the isolation of eukaryotic genes 
involved in the carotenoid biosynthetic pathway, including a 
gene encoding an e cyclase, IPP isomerase and /3-carotene 
hydroxylase. There remains a need for methods to enhance the 
production of carotenoids. There also remains a need in the 
art for methods for screening for eukaryotic genes encoding 
enzymes of carotenoid biosynthesis and metabolism. 

fitTMMARY OF T WF TNVENTION 

Accordingly, a first object of this invention is to 
provide isolated eukaryotic genes which encode enzymes 
involved in carotenoid biosynthesis; in particular, € cyclase, 
IPP isomerase and /S-carotene hydroxylase. 
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A second object of this invention is to provide • 
eukaryotic genes which encode enzymes which produce novel 
carotenoids. 

A third object of the present invention is to provide 
vectors containing said genes. 

. A fourth object of the present invention is to provide 
hosts transformed with said vectors. 

Another object of the present invention is to provide 
hosts which accumulates novel or rare carotenoids. or which 
overexpress known carotenoids. 

Another object of the present invention is to provide 
hosts with inhibited carotenoid production. 

Another object of this invention is to secure the 
expression of eukaryotic carotenoid-related genes in a 
recombinant prokaryotic host. 

A final object of the present invention is to provide a 
method for screening for eukaryotic genes which encode enzymes 
involved in carotenoid biosynthesis and metabolism. 

These and other objects of the present invention have 
been realized by the present inventors as described below. 

BRIEF DESCRIPTION OF THE DRAWINGS 
A more complete appreciation of the invention and many of 
the attendant advantages thereof will be readily obtained as 
the same becomes better understood by reference to the 
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following detailed description when considered in connection 
with the accompanying drawings, wherein: 

Figure 1 is a schematic representation of the pathway of 
/3-carotene biosynthesis in cyanobacteria , algae and plants. 
The enzymes catalyzing various steps are indicated at the 
left. Target sites of the bleaching herbicides NFZ and MPTA 
are also indicated at the left. Abbreviations: DMAPP, 
dimethylallyl pyrophosphate; FPP, farnesyl pyrophosphate; 
GGPP, geranylgeranyl pyrophosphate; GPP, geranyl 
pyrophosphate; IPP, isopentenyl pyrophosphate; LCY, lycopene 
cyclase; MVA, mevalonic acid; MF^TA, 2-(4- 

methylphenoxy) triethylamine hydrochloride; NFZ, norflurazon; 
PDS, phytoene desaturase; PSY , phytoene synthase; ZDS, 
carotene desaturase; PPPP, prephytoene pyrophosphate. 

Figure 2 depicts possible routes of synthesis of cyclic 
carotenoids and common plant and algal xanthophylls 
(oxycaroteriblds) from neurosporene . Demonstrated activities 
of the and €- cyclase enzymes of A. t:hsillana are indicated 
by bold arrows labelled with /3 or e respectively. A bar below 
the arrow leading to e -carotene indicates that the enzymatic 
activity was examined but no product was detected. The steps 
marked by an arrow with a dotted line have not been 
specifically examined. Conventional numbering of the carbon 
atoms is given for neurosporene and a-carotene. Inverted 
triangles (▼) mark positions of the double bonds introduced as 
a consequence of the desaturation reactions. 
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Figure 3 depicts the carotene endgroups which are found 
in plants. 

Figure 4 is a DNA sequence and the predicted amino acid 
sequence of e cyclase isolated from A. thaliana (SEQ ID NOS: 1 
and 2). These sequences were deposited under Genbank 
accession number U50738. This cDNA is incorporated into the 
plasmid pA*reps. 

Figure 5 is a DNA sequence encoding the 7?-carotene 
hydroxylase isolated from A. thaliana (SEQ ID NO: 3). .This 
cDNA is incorporated into the plasmid pATOHB. 

Figure 6 '.is an alignment of the predicted amino acid 
sequences of A. thaliana )S-carotene hydroxylase (SEQ ID NO: 4) 
^i't*^ bacterial enzymes from AJicaJgenes sp. (SEQ ID NO: 5) 

(Genbank D58422) , Erwlnla herhlcola EholO (SEQ ID NO.: 6) 
(GenBank M872280) , Erwinia uredovora (SEQ ID NO.: 7) (GenBank 
D90087) and Agroi?acterjL urn auriantlcum (SEQ ID NO.: 8) (GenBank 
D58420) - A Consensus sequence is also shown. Consensus is 
identical for all five genes where a capital letter appears. 
A lowercase letter indicates that three of five, including A. 
thaliana, have the identical residue. TM; transmembrane 

Figure 7 is a DNA sequence of a cDNA encoding an IFF 
isomerase isolated from A. thaliana (SEQ ID NO: 9). This cDNA 
is incorporated into the plasmid pATDPS. 

Figure 8 is a DNA sequence of a second cDNA encoding 
another IFF isomerase isolated from A. thaliana (SEQ ID NO: 
10). This cDNA is incorporated into the plasmid pATDP7 . 



Figure 9 is a DNA sequence of a cDNA encoding an IP?, 
isomerase isolated from Haemacococcus pluviaJ-s (SEC ID NO: 
11). This CDNA is incorporated into the plasrrad pKP04 . 

Figure 10 is a DNA sequence of a second cDNA encoding 
another IPP isomerase isolated fron. Haemazoccccus pluvialis 
(SEQ ID NO: 12) - This cDNA is incorporated into the plasmid 
pHPOE . 

Figure 11 is an alignment of the predicted amino acid 
sequences of the IPP isomerase isolated from A. thai lana (SEC 
ID NO.: 16 and 18). H. pluvialis (SEQ ID NOS . . : 24 and 15), 
Clarkia hreweri (SEQ ID NG. : 17) (Se^, Elanc u Pichersky, 
Plant Physiol. (1995) 108:855; Genbank accession no. X82627) 
and Saccharomyc^s aerey^isiae (SEQ ID NO. : 19) (Genbank 

accession no. J05090) . 

Figure 12 is a DNA sequence of the cDNA encoding an IPP 
isomerase isolated from marigold (SEQ ID NO: 13). This cDNA 
is incorporated into the plasmid pPMDPl . xxx ■ s denote a 
region not yet sequenced at the time when this applicaiton was 
prepared . - - 

Figure 13 is an alignment of the consensus sequence of 4 
plant S-cyclases (SEQ ID NO.: 20) with the A. zhallana c- 
cyclase (SEQ ID NO.: 21) A capital letter in the plant 3 
consensus is used where all 4 ? cyclase genes predict the same 
ammo acid residue m this position. A small letter indicates 
chat an identical residue was found in 3 or the 4. Dashes 
indicate that the amino acid residue was not conserved and 
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dots in the sequence denote a gap. A consensus for the • 

« 

aligned sequences is given, in capital letters below the 
alignment, where the /S and € cyclase have the same amino acid, 
residue. Arrows indicate some of the conserved amino gcids. 
that will be used as junction sites for construction of 
chimeric cyclases with novel enzymatic activities. Several 
regions of interest including a secpaence signature indicative 
of a dinucleotide-binding motif and 2 predicted transmembrane 
(TM) helical regions are indicated below the alignment and are 
underlined. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 
Isolated eukarvoti c genes which encode enzymes involved in 
carot ene id biosynthesis 

The present inventors have now isolated eukaryotic genes 
encoding c cyclase and /?-carotene hydroxylase from A. tAaJiajja 
and IFF isomerases from several sources. 

The present inventors have now isolated the eukaryotic 
gene encoding the enzyme IFF isomerase which catalyzes the 
conversion of isopentenyl pyrophosphate (IFF) to dimethylallyl 
pyrophosphate (DMAFF) . IFF isomerases were isolated from A. 
thAll^na, H. pluvlalls and marigold. 

Alignments of these are shown in Figure 12 (excluding the 
marigold sequence) . Flasmids containing these genes were 
-deposited with the American Type Culture Collection, 12301 
Farklawn Drive, Rockville MD 20852 on March 4, 1996 under ATCC 
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accession numbers 98000 (pHP05 - H. pluvialis) ; 98001 (pMDPl - 
marigold); 98002 (pATDP7 - H. pluvialis) and 98004 (pHP04 - H. 
pjuvial-is) . 

The present inventors have also isolated the gene 
encoding the enzyme, £ cyclase, which is responsible for the 
formation o;f e endgroups in carotenoids. A gene encoding an c 
cyclase from any organism has not heretofore been described. 
The A. thaliane e cyclase adds an e-ring to only one end of 
the syjnmetrical lycopene while the related B-cyclase adds a 
ring at both ends. The DNA of the present invention is shown 
in Figure 4 an»a' SEQ ID NO: 1. A plasmid containing this gene 
was deposited with the American Type culture Collection, 12301 
Parklawn Drive, Rockville MD 20852 on March A, 1996 under ATCC 
accession nuiaber 98005 (pATeps - A. thaliana) . 

The present inventors have also isolated the gene 
encoding the enzyme, /3-carotene hydroxylase, which is 
responsibly for hydroxy lating the /3 endgroup in carotenoids. 
The DNA of the present invention is shown in SEQ ID NO: 3 and 
Figure 5. The full length gene product hydroxylates both end 
groups of /3-carotene as do products of genes which encode 
proteins truncated by up to 50 amino acids from the N- 
terminus. Products of genes which encode proteins truncated 
between about 60-110 amino acids from the N- terminus 
preferentially hydroxylates only one ring. A plasmid 
containing this gene was deposited with the American Type 
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Cul-ture Collection, 12301 Parklawn Drive, Kockville MD 20852 
on March 4, 1996 under ATCC accession number 98003 (pATOHB - 
A. thallana) . 

Eukar votiic genes which encode enzymes which produce novel or 
rare carotenoids 

The piresent invention also relates to novel enzynes which 
can transform known carotenoids into novel or rare products. 
That is, currently e-carotene (see figure 2) and Y^carptene 
can only be isolated in minor amounts. As described below, an 
enzyme can be .'.produced which would transfozTn lycopene to 7- 
carotene and lycopene to c -carotene- With these products in 
hand, bulk synthesis of other carotenoids derived from them 
are possible. For example, c-carotene can be hydroxylated to 
form an isomer of lutein (1 c- and 1 /3-ring) and zeaxanthin (2 
/3-rings) where both endgroups are, instead, €-rings. 

The eukaryotic genes in the carotenoid biosynthetic 
pathway differ from their prokaryotic counterparts in their 5' 
region. As used herein, the 5' region is the region of 
eukaryotic DNA which precedes the initiation codon of the 
counterpart gerie in prokaryotic DNA. That is, when the 
consensus areas of eukaryotic and prokaryotic genes are 
aligned, the eukaryotic genes contain additional coding 
sequences upstream of the prokaryotic initiation codon* 
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The present: inventors have found that: the amount of the 

« 

5' region present can alter the activity of the eukaryotic 
enzyme- Instead of diminishing activity, truncating the 5' 
region of the eukaryotic gene results in an enzyme with a 
different specificity- Thus, the present invention relates to 
enzymes which are truncated to within 0-50, preferably 0-25, 
codons of the 5' initiation codon of their prokaryotic 
counterparts as determined by alignment maps* 

For example, as discussed above, when the gene encoding 
A- thaliana. /3-carotene hydroxylase was truncated, the 
resulting enzyme catalyzed the formation of /J-cryptoxanthin as 
major product and zeaxanthin as minor product; in contrast to 
its normal production of zeaxanthin. 

In addition to novel enzymes produced by truncating the 
5' region of known enzymes, novel enzymes which can 
participate in the formation of novel carotenoids can be 
formfed by replacing portions of one gene with an analogous 
sequence from a structurally related gene. For example, 0- 
cyclase and €-cyclase are structurally related (see Figure 
13) . By replacing a portion of /S-lycopene cyclase with the 
analogous portion of e -cyclase, an enzyme which produces 7- 
carotene will be produced (1 endgroup) . Further, by replacing 
a portion of the e-lycopene cyclase with the analogous portion 
of /S-cyclase, an enzyme which produces e -carotene will be 
' produced (c~cyclase normally produces a compound with 1 e- 
endgroup (6-carotene) not 2) . Similarly, /3-hydroxylase could 
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be modified to produce enzymes of novel function by creation 
of hybrids with € -hydroxylase . 

Vectors 

The genes encoding the carotenoid enzymes as described 
above, when cloned into a suitable expression vector, can be 
used to overexpress these enzymes in a plant expression system 
or to inhibit the expression of these enzymes. For example, 'B" 
vector containing the gene encoding € -cyclase can be used to 
increase the amount of Q-carotene in an organism and thereby 
^I'ter the nutritional value, pharmacology and visual 
appearance value of the organism. 

In a preferred embodiment, the vectors of the present 
invention contain a DNA encoding an eukaryotic IPP isomerase 
upstream of a DNA encoding a second eukaryotic carotenoid 
enzyme. The inventors have discovered that inclusion of an 
IPP isome^sise gene increases the supply of substrate for the 
carotenoid pathway; thereby enhancing the production of 
carotenoid endproducts. This is apparent from the much deeper 
pigmentation in carotenoid-accumulating colonies of E. call 
which also contain one of the aforementioned IPP isomerase 
genes when compared to colonies that lack this additional IPP 
isomerase gene. Similarly, a vector comprising an IPP 
isomerase gene can be used to enhance production of any 
secondary metabolite of dimethylallyl pyrophosphate (such as 
isoprenoids, steroids, carotenoids, etc.). 
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Alternatively, an anti-sense strand of one of the above 
genes can be inserted into a vector. For example, the e- 
cyclase gene can be inserted into a vector and incorporated 
into the genomic DNA of a host, thereby inhibiting the . 
synthesis of carotenoids (lutein and a-carotene) and 

enhancing the synthesis of /3,/3 carotenoids (zeaxanthin and ^- . 
carotene) . 

Suitable vectors according to the present invention 
comprise a eukaryotic gene encoding an enzyme involved in 
carotenoid biosynthesis or metabolism and a suitable promoter 
for the host can be constructed using techniques well known in 
the art (for example Sambrook et al., Molecular Cloning A 
Laboratorv Manual . Cold Spring Harbor Laboratory, Cold Spring 
Harbor, NY, 1989). 

Suitable vectors for eukaryotic expression in plants are 
described in Frey et al. , Plant J. (1995) 8<5):693 and Misawa 
et al, 1994a; incorporated herein by reference. 

Suitable vectors for prokaryotic expression include 
PACYC184, pUC119, and PBR322 (available from New England 
BioLabs, Bevery, MA) and pTreHis (Invitrogen) and pET2B 
(Novagene) and derivatives thereof. 

The vectors of the present invention can additionally 
contain regulatory elements such as promoters, repressors 
selectable markers such as antibiotic resistance genes, etc. 
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Host:s .. • * 

Host sysliems according to the present invention can 
comprise any organism that already produces carotenoids or 
which has been genetically modified to produce carotenoids.* 
The IFF isomerase genes are more broadly applicable for 
enhancing production of any product dependent on DMAPP as a 
precursor. 

Organisms which already produce carotenoids include 
plants, algae, some yeasts, fungi and cyanobacteria and other 
photosynthetic bacteria. Transformation of these hosts with 
vectors according to the present invention can be done using 
standard techniques such as those described in Misawa et al. , 

(1990) supra; Hundle et al., (1993) supra; Hundle et al., 

(1991) supra; Misawa et al., (1991) supra; Sandmann et al., 
supra; and Scnurr et al., supra; all incorporated herein by 
reference. 

Alternatively, transgenic organisms can be constructed 
which include the DNA secfuences of the present invention (Bird 
et al, 1991; Bramley et al, 1992; Misawa et al, 1994a; Misawa 
et al, 1994b; Cunningham et al, 1993). The incorporation of 
these sequences can allow the controlling of carotenoid 
biosynthesis, content, or composition in the host cell. These 
transgenic systems can be constructed to incorporate sequences 
which allow over-expression of the carotenoid genes of the 
present invention. Transgenic systems can also be constructed 
containing antisense expression of the DNA sequences of the 
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present invention. Such antisense expression would result in 
the accumulation of the substrates of the substrates of the 
enzyme encoded by the sense strand. 

A method for screening for euka rvotic genes which encode 

enzymes involved in caroten oid biosynthesis 

The method of the present invention comprises 

transforming a prokaryotic host with a DNA which may contain a 

eukaryotic or prokaryotic carotenoid biosynthetic gene; 

culturing said transformed host to obtain colonies; and 

screening for 6dlonies exhibitintr a different color than 

colonies of the untransf ormed host. 

Suitable hosts include E. coli, cyanobacteria such as 

Synechocaccus and Synochocystls , alga and plant cells. E. 

coli are preferred. 

In a preferred embodiment, the above "color 
complement^ation test" can be enhanced by using mutants which 
are either (1) deficient in at least one carotenoid 
biosynthetic gene or (2) overexpress at least one carotenoid 
biosynthetic gene. In either case, such mutants will 
accumulate carotenoid precursors . 

Prokaryotic and eukaryotic DNA libraries can be screened 
in total for the presence of genes of carotenoid biosynthesis, 
metabolism and degradation. Preferred organisms to be 
screened include photosynthetic organisms. 
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E. coll can be transformed with these eukaryotic cDNA 
libraries using conventional methods such as those described 
in Sambrook et al, 1989 and according to protocols described 
by the venders of the cloning vectors. 

For example, the cDNA Libraries in bacteriophage vectors 
such as lambdaZAP (Stratagene) or lambdaZIPOLOX (Gibco BRL) 
can be excised en masse and used to, transform ^.coiican be 
inserted into suitable vectors and these vectors can th0 be 
used to transform E. coll. Suitable vectors include pACyci84, 
pUCll9, PBR322 (available from New England BioLabs, Bevery, 
MA) . pACYC is' preferred. 

Transformed E. coll can be cultured using conventional 
techniques. The culture broth preferably contains antibiotics 
to select and maintain plasmids. Suitable antibiotics include 
penicillin, ampicillin, chloramphenicol, etc. Culturing is 
typically conducted at 20-4 O^C, preferably at room temperature 
(20-25^C) for 12 hours to 7 days. 

Cultures are plated and the plates are screened visually 
for colonies with a different color than the colonies of the 
untransformed host E. coll. For example, E. coll transformed 
with the plasmid, pAC-BETA (described below) , produce yellow 
colonies that accumulate /3-carotene. After transformation 
with a CDNA library, colonies which contain a different hue 
than those formed by E. coIi/pAC-BETA would be expected to 
contain enzymes which modify the structure or degree of 
expression of /3-carotene. Similar standards can be engineered 
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Which overexpress earlier products in carctenoid biosynthesis., 
such as lycopene, 7-carotene, etc. 

Having generally described this invention, a further- 
understanding can be obtained by reference to certain specific 
exaihpies which are provided herein for purposes of 
illustration only and are not intended to be limiting unless 
otherwise specified. 

EXft«P3UE 

I. Isolation of C-carotenf > >iydrQXVlase 
Plasaid Construction 

An 8.6kb Bglll fragment containing the carotenoid 
biosynthetic genes of Erw±n±a herbxcola was first cloned in 
the BamHI site of plasmid vector pACYC184 (chloramphenicol 
resistant), and then a i.ikb BamHI fragment containing the 6- 
carotene hydroxylase {CrtZ) was deleted. The resulting 
plasmid i pAC-BETA, contains all the genes for the formation of 
B-carotene. E.coll strains containing this plasmid accumulate 
6-carotene and form yellow colonies (Cunningham et al., 1994). 

A full length gene encoding IPP isomerase of 
Haematococcus pluvialis (HP04) was first cut out with BamHI- 
Kpnl from pBluescript SK+, and then cloned into a pTrcHisA 
vector with high-level expression from the trc promoter 
(invitrogen lnc.)._A fragment containing the IPP isomerase 
and trc promoter was excised with EcoRV-Kpnl and cloned in 
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Hindlll.site of pAC-BETA, E.coll cells transformed with this, 
new plasmid pAC-BETA-04 form orange (deep yellow) colonies on 
LB plates and accumulate more /3-carotene than cells that 
contain pAC-BETA, . . 

Screening of the Arabidopsis cDKA Library 

Several X cOHA expression libraries of Arabidopsis were 
obtained from the Arabxdopsls Biological Resource Center (Ohio 
State University, Columbus, OH) (Kieber et al,, 1993). The k 
cDNA libraries were excised In vivo using Stratagene's 
ExAssist SOLR system to produce a phagemid cDNA library 
wherein each clone also contained an amphicillin* 

Emcoli strain DHIOBZIP was chosen as the host cells for 
the screening and pigment production. DHIOB cells were 
transformed with plasmid pAC-BETA-04 and were plated on LB 
agar plates containing chloramphenicol at 50 /ig/ml (from 
United States Biochemical Corporation). The phagemid 
Arabidopsis cDNA library was then introduced into DHIOB cells 
already containing pAC-BETA-04 . Transformed cells cont^aining 
both pAC-BETA-04 and Arabidopsis cDNA were selected on 
chloramphenicol plus ampicillin (150 /zg/ml) agar plates. 
Maximum color development occurred after 5 days incubation at 
room temperature, and lighter yellow colonies were selected. 
Selected colonies were inoculated into 3 ml liquid LB medium 
containing ampicillin and chloramphenicol^ and cultures were 
incubated. Cells were then pelleted and extracted in 80 /ul 
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100% acetone in roicrofuge tiubes. . After centrif ugation, 
pigmented supernatant was spotted on silica gel thin-layer 
chromatography (TLC) plates, and developed with a hexane; 
ether (1:1) solvent system. B-carotene hydroxylase clones 
were identified based on the appearance of zeaxanthin on TLC 
plate. . .. 

Subclonina and SecmcncinQ 

The B*carotehe hydroxylase cDNA was isolated by standard 
procedures (Sambrook et al., 1989). Restriction maps showed 
that three jLnd^pendent inserts (1.9kb, 0.9kb and O.Skb) 
existed in the cDNA. To determine which cDNA insert confers 
the B-carotene hydroxylase activity, plasmid DNA was digested 
with NotI (a site in the adaptor of the cDNA library) and 
three inserts were subcloned into NotI site of SK vectors. 
These subclones were used to transform E. coll cells 
containing pAC-BETA-04 again to test the hydroxylase activity. 
A fragment of 0.95kb, later shown to contain the hydroxylase 
gene, was also blunt-ended and cloned into pTrcHis A,B,C 
vectors. To remove the N terminal sequence, a restriction 
site (Bglll) was used that lies just before the conserved 
sequence with bacterial genes. A Bglll-Xhol fragment was 
directionally cloned in BamHI-XhoI digested trc vectors. 
Functional clones were identified by the color complementation 
^est. A /3-carotene hydroxylase enzyme produces a colony with 
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a lighter yellow color than is found in cells containing pAC-^ 
BETA- 04 alone- 

Arabidopsis fl-carotene hydroxylase was sequenced 
completely on both strands on an automatic sequencer (Applied 
Biosystems, Model 373A, Version 2.0. IS). 

Pigment Analysis 

A single colony was used to inoculate 50 ml of LB 
containing ampicillin and chloramphenicol in a 250-ml flask. 
Cultures were incubated at 28*^0 for 36 hours with gentle 
shaking, and then harvested at 5000 rpm in an SS-34 rotor. 
The cells were washed once with distilled HjO and resuspended 
with 0*5 ml of water. The extraction procedures and HPLC were 
essentially as described previously (Cunningham et al, 1994) . 

II- Isolation of g cvclase 
Plasmid Construction 

Construction of plasmids pAC-LYC, pAC-NEUR, and pAC-ZETA 
is described in Cunningham et al. , (1994). In brief, the 
appropriate carotenoid biosynthetic genes from Erwinia 
herbicolB, Rhodobacter capsulatus , and Synechococcus sp. 
strain PCC7942 were cloned in the plasmid vector pACYC184 (New 
England BioLabs, Beverly, MA) . Cultures of E. coll containing 
the plasmids pAC-ZETA, pAC-NEUR, and pAC-LYC, accumulate 
carotene, neurosporene , and lycopene, respectively. The 
plasmid pAC-ZETA was constructed as follows: an 8.6-kb Bglll 
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fraginent containing the carotenoid biosynthetic genes of E. 
hBrblcola (GenBank M87280; Hundle et al., 1991)' was obtained 
after partial digestion of plasmid pPL376 (Perry et al., 1986; 
Tuveson et al., 1986) and cloned in the BamHI site of pACYClS4 
to give the plasmid pAC-EHER. , Deletion of adjacent 0*8- and 
l.l-kb BamHl— BamHI fragments (deletion Z in Cunningham et al., 
1994), and of a 1.1 kB Sall-Sall fragment (deletion X) served 
to remove most of the coding regions for the E. herblcola /S- 

carotene hydroxylase (crt gene) and zeaxanthin 

* 

glucosyltransf erase (crtx gene) , respectively. The resulting 
plasmid, pAC-B^A, retains functional genes for geranylgeranyl 
pyrophosphate synthase (crtE) , phytoene synthase (crtB) , 
phytoene desaturase (crti) , and lycopene cyclase (crtY) . 
Cells of E. coll containing this plasmid form yellow colonies 
and accximulate ^-carotene. A plasmid containing both the 
and /3-cyclase cDNAs of A. thaliana was constructed by excising 
the € cyclaifee in clone y2 as a PvuI-PvuII fragment and 
ligating this piece in the SnaBI site of a plasmid (pSPORT 1 
from GIBCO-BRL) that already contained the cyclase. 

oraaniams and Growth condifcioaa 

E. coll strains TOPIO and TOPIO F' (obtained from 
Invitrogen Corporation, San Diego, CA) and XLl-Blue 
tStratagene) were grown in Luria-Bertani (LB) medium (Sambrook 
et al., 1989) at 37*>C in darkness on a platform shaker at 225 
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cycles per min. Media components were from Dif<:o (yeast 
extract and tryptone) or Sigma (NaCl) . Ampicillin at 150 
Mg/mL and/or chloramphenicol at 50 /ig/mL (both from United 
States Biochemical Corporation) were used, as appropriate, for 
selection and maintenance of plasmids. 
" ' • 

Mass EKcisio n and Color Complementation Screening of an A, 
fciialian a cDNA Library 

A size-fractionated 1-2 kB cDNA library of A. thall^na in 
lambda ZAPII (Kieber et al. , 1993) was obtained from the 
Arabidopsis Biological Resource Center at The Ohio State 
University ( stock niomber CD4-14). Other size fractionated 
libraries were also obtained (stpck numbers CD4-13, CD4-15, 
and CD4-16). . An aliquot of each library was treated to cause 
a mass excision of the cDNAs and thereby produce a phagemid 
library according to the instructions provided by the supplier 
of the cloning vector (Stratagene; E. coll strain XLl-Blue and 
the helper phage R4 08 were used) . The titre of the excised 
phagemid was determined and the library was introduced into a 
lycopene-accumulating strain of E. coli TOPIO F' (this strain 
contained the plasmid pAC-LYC) by incubation of the phagemid 
with the E. coli cells for 15 min at 37 •^C. Cells had been 
grown overnight at 30^C in LB medium supplemented with 2% 
(w/v) maltose and 10 mM MgSO^ (final concentration) , and 
harvested in 1.5 ml^microfuge tubes at a setting of 3 on an 
Eppendorf microfuge (5415C) for 10 min. The pellets were 



wo 97/36998 




PCTAJS97/00540 



resuspended in 10 mM MgSO, to a volume equal to one-half that 
of the initial culture volume. Transf ormants were spread on 
large (150 mm diameter) LB agar petri plates containing 
antibiotics to provide for selection of cDNA clones 
(ampicillin) and maintenance, of pAC-LYC (chloramphenicol) . 
Approximately 10,000 colony forming units were spread on each 
plate. Petri plates were incubated at 37 "C for 16 hr and then 
at room temperature for 2 to 7 days to allow maximum color 
development. Plates were screened visually with the aid of an 
illuminated 3x magnifier and a low power stage-dissecting 
microscope for the rare, pale pinkish-yellow to deep-yellow 
colonies that could be observed in the background of pink 
colonies. A colony color of yellow or pinkish-yellow was 
taken as presumptive evidence of a cyclization activity. 
These yellow colonies were collected with sterile toothpicks 
and used to inoculate 3ml of LB medium in culture tubes with 
overnight growth at 37«>C and shaking at 225 cycles/min. 
Cultures were split into two aliquots in microfuge tubes and 
harvested by centrif ugation at a setting of 5 in an Eppendorf 
5415C microfuge. After discarding the liquid, one pellet was 
frozen for later purification of plasmid DNA. To the second 
pellet was added 1.5 ml EtOH, and the pellet was resuspended 
by vortex mixing, and extraction was allowed to proceed in the 
dark for 15-3 0 min with occasional remixing. Insoluble 
-materials were pelleted by centrif ugation at maximxim speed for 
10 min in a microfuge. Absorption spectra of the supernatant 
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fluids were recorded from 350-550 nm with a Perkin Elmer ♦ 

t. 

lambda six spectrophotometer. 

Analvais of isolated elonea 

Eight of the yellow colonies contained /(?-carotene 
indicating that a single gene product catalyzes both 
cyclizations required to form the two 0 endgroups of the 
symmetrical ^-carotene from the symmetrical precursor 
lycopene. one of the yellow colonies contained a pigment with 
the spectrum characteristic of «5-carotene, a monocyclic 
carot^noid with a single e endgroup. Unlike the 0 cyclase, 
this € cyclase appears unable to carry out a second 
cyclization at the other end of the molecule^ 

The observation that e cyclase is unable to form two 
cyclic 6 endgroups (e.g. the bicyclic € -carotene) illxaminates 
the mechanism by which plants can coordinate and control the 
flow of substrate into carotenoids derived from /3-carotene 
versus those derived from a-carotene and also can prevent the 
formation of carotenoids with two e endgroups. 

The availability of the A. thalianQ gene encoding the € 
cyclase enables the directed manipulation of plant and algal 
species for modification of carotenoid content and 
composition. Through inactivation of the e cyclase, whether 
at the gene level by deletion of the gene or by insert ional 
inactivation or by reduction of the amount of enzyme formed 
(by such as antisense technology) , one may increase the 
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formation of /8-carotene and other pigments derived from it. 
Since vitamin A is derived only from carotenoids with /? 
endgroups, an enhancement of the production of /3-carotene 
versus a-carotene may enhance nutritional value of crop 
plants. Reduction of carotenoids with 6 endgroups may also be 
of value in modifying the color properties of crop plants and 
specific tissues of , these plants. Alternatively, wherp 
production of a-carotene, or pigments such as lutein that are 
derived from a-carotene, is desirable, whether for the color 
properties, nutritional value or other reason, one may 
overexpress^ th4 c cyclase or express it in specific tissues. 
Wherever agronomic value of a crop. is related to pigmentation 
provided by carotenoid pigments the directed manipulation of 
expression of the c cyclase gene and/ or production of the 
enzyme may be of commercial value. 

The predicted amino acid sequence of the A. thaliana € 
cyclase enzyme was determined. A comparison of the amino acid 
sequences of the /3 and e cyclase enzymes of Arabidopsls 
thstll^a (Fig. 13) as predicted by the DNA sequence of the 
respective genes (Fig. 4 for the e cyclase cDNA sequence), 
indicates that these two enzymes have many regions of sequence 
similarity, but they are only about 37% identical overall at 
the amino acid level. The degree of sequence identity at the 
DNA base level, only about 50%, is sufficiently low such that 
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we and others have been unable t:o detect this gene by 
hybridization using the /3 cyclase as a probe in DNA gel blot 
experiments . 
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apparent to one of ordinary skill in the art that many changes 
and modifications can be made thereto without departing from 
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(A) ADDRESSEE: OBLON , SPIVAK, MCCLELLAND, MAIER & NEUSTADT, 
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(C) CITY: ARLINGTON * • ' 

(D) STATE: VA 

(E) COUNTRY: USA 
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(B) ' COMi>UTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC- DOS /MS -DOS 
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(C) CLASSIFICATION: 
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(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 186 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRAIODEDNESS : Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 109.. 1680 

(D) OTHER INFORMATION: /product^ "E-CYCLASE FROM A. 
THALIANA" 



SUBSmUTE SHEET (RULE 26) 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 
ACAAAAGGAA ATAATTAGAT. TCCTCTTTCT GCTTGCTATA CCTTGATAGA ACAATATAAC €0 

AATGGTGTAA GTCTTCTCGC TGTATTCGAA ATTATTTGGA GGAGGAAA ATG GAG TGT 117 

Met Glu Cys 
1 

GTT GGG GCT AGG AAT TTC GCA GCA ATG GCG GTT TCA ACA TTT CCG TCA 16 5 

Val Gly Ala Arg Asn Phe Ala Ala Met Ala Val Ser Thr Phe Pro Ser 

5 ■ ■ 10 * ■ 15 

TGG AGT TGt CGA AGG AAA TTT CCA GTG GTT AAG AGA TAC AGC TAT AGG 213 " 

Trp Ser Cys Arg Arg Lys Phe Pro Val Val Lys Arg Tyr Ser Tyr Arg 
20 . .25 . . 30 . • 35. • . 

AAT ATT CGT TTC GGT TTG TGT AGT GTC AGA GCT AGC GGC GGC GGA AGT 261 
Asn lie Arg Phe Gly Leu Cys Ser Val Arg Ala Ser Gly Gly Gly Ser 

^^9 . 45 • 50 * 

TCC GGT AGT GAG AGT TGT GTA GCG GTG AGA GAA GAT TTC GCT GAC GAA . " 3 OS 

Ser Gly Ser Glu S,e^ Cys Val Ala Val Arg Glu Asp Phe Ala Asp Glu 
;55 •* 60 65 

GAA GAT TTT GTG AAA GCT GGT GGT TCT GAG ATT CTA TTT GTT CAA ATG 357 
Glu Asp Phe Val Lys Ala Gly Gly Ser Glu He Leu Phe Val Gin Met 
'70 75 ■ . 80 

CAG CAG AAC AAA GAT ATG GAT GAA CAG TCT AAG CTT GTT GAT AAG TTG 4 05 

Gin Gin Asn Lys Asp Met Asp Glu Gin Ser Lys Leu Val Asp Lys Leu 
85 90 95 

CCT CCT ATA TCA ATT GGT GAT GGT GCT TTG GAT CAT GTG GTT ATT GGT 453 
Pro Pro He S^r. He Gly Asp Gly Ala Leu Asp His Val Val He Gly 
^00 • • 105 110 115 

TGT GGT CCT GCT GGT TTA GCC TTG GCT GCA GAA TCA GCT AAG CTT GGA 501 
Cys Gly Pro Ala Gly Leu Ala Leu Ala Ala Glu Ser Ala Lys Leu Gly 
120 125 130 



TTA AAA GTT GGA CTC ATT GGT CCA GAT CTT CCT TTT ACT AAC AAT TAC 
Leu Lys Val Gly Leu He Gly Pro Asp Leu Pro Phe Thr Asn Asn Tyr 
135 140 145 



549 



GGT GTT TGG GAA GAT GAA TTC T^T GAT CTT GGG CTG CAA AAA TGT ATT 5 97 

Gly Val Trp Glu Asp Glu Phe Asn Asp Leu Gly Leu Gin Lys Cys He 
150 155 160 

GAG CAT GTT TGG AGA GAG AGT ATT GTG TAT CTG GAT GAT GAC AAG CCT 64 5 

Glu His Val Trp Arg Glu Thr He Val Tyr Leu Asp Asp Asp Lys Pro 
165 170 175 

ATT ACC ATT GGC CGT GCT TAT GGA AGA GTT AGT CGA CGT TTG CTC CAT 6 93 

He Thr He Gly Arg Ala Tyr Gly Arg Val Ser Arg Arg Leu Leu His 

2-80 1B5 190 195 
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GAG •GAG CTT TTG AGG AGG TGT GTC GAG TCA GGT GTC TCG TAC CTT AGC 741 
Glu Glu Leu Leu Arg Arc Cys Val Glu Ser Gly Val Ser Tyr Leu Ser 

200 205 210 ^ 

TCG. AAA GTT GAC AGC ATA ACA GAA GCT TCT GAT GGC CTT AGA CTT GTT 78 9 

Ser Lys Val Asp Ser lie Thr Glu Ala Ser Asp Gly Leu Arg Leu Val 
215 220 225 

GCT TGT GAC GAC AAT AAC GTC ATT CCC TGC AGG CTT GCC ACT GTT GCT 83 7 

Ala Cys Asp Asp Asn Asn Val lie Pro Cys Arg Leu Ala Thr Val Ala 
230 235 240 

TCT GGA GCA GCT TCG GGA AAG CTC TTG CAA TAC GAA GTT GGT GGA CCT 88 5 

Ser Gly Ala Ala Ser Gly Lys Leu Leu Gin Tyr Glu Val Gly Gly Pro 
245 250 255 

AGA GTC TGT GTG CAA ACT GCA TAC GGC GTG GAG GTT GAG GTG GAA AAT 933 
Arg Vc^l Cys Val Gin Thr Ala Tyr Gly Val Glu Val Glu Val Glu Asn 
260 265 270 275 

ACT CCA TAT GAT CCA GAT CAA ATG GTT TTC ATG GAT TAC AGA GAT TAT 9 81 

ser Pro Tyr Asp Pro Asp Gin Met Val Phe Met Asp Tyr Arg Asp Tyr 
280 285 290 

ACT AAC GAG AAA GTT CGG AGC TTA GAA GCT GAG TAT CCA ACG TTT CTG 102 9 

Thr Asn Glu Lys Val Arg Ser Leu Glu Ala Glu Tyr Pro Thr Phe Leu 
295 300 305 

TAC GCC ATG CCT ATG ACA AAG TCA AGA CTC TTC TTC GAG GAG ACA TGT 1077 
Tyr Ala Met Pro Met Thr Lys Ser Arg Leu Phe Phe Glu Glu Thr Cys 
310 315 320 

TTG GCC TCA AAA GAT GTC ATG CCC TTT GAT TTG CTA AAA ACG AAG CTC 112 5 

Leu Ala Ser Lys Asp Val Met Pro Phe Asp Leu Leu Lys Thr Lys Leu 

325 330 335 

ATG TTA AGA TTA GAT ACA CTC GGA ATT CGA ATT CTA AAG ACT TAC GAA 117 3 

Met Leu Arg Leu Asp Thr Leu Gly lie Arg He Leu Lys Thr Tyr Glu 
340 345 350 355 

GAG GAG TGG TCC TAT ATC CCA GTT GGT GGT TCC TTG CCA AAC ACC GAA 1221 
Glu Glu Trp Ser Tyr He Pro Val Gly Gly Ser Leu Pro Asn Thr Glu 
360 365 370 

CAA AAG AAT CTC GCC TTT GGT GCT GCC GCT AGC ATG GTA CAT CCC GCA 12 6 9 

Gin Lys Asn Leu Ala Phe Gly Ala Ala Ala Ser Met Val His Pro Ala 
375 380 385 

ACA GGC TAT TCA GTT GTG AGA TCT TTG TCT GAA GCT CCA AAA TAT GCA 1317 
Thr Gly Tyr Ser Val Val Arg Ser Leu Ser Glu Ala Pro Lys Tyr Ala 
390 395 400 

TCA GTC ATC GCA GAG ATA CTA AGA GAA GAG ACT ACC AAA CAG ATC AAC 136 5 

Ser Val He Ala Glu He Leu Arg Glu Glu Thr Thr Lys Gin He Asn 
405 410 415 

AGT AAT ATT TCA AGA CAA GCT TGG GAT ACT TTA TGG CCA CCA GAA AGG 1413 
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35 

See: Asn Ile Ser Arg Gin Ala Trp Asp Thr Leu Trp Pro Pro Glu Arg 

420 425 430 . 435 

AAA AGA CAG AGA GCA TTC TTT CTC TTT GGT CTT GCA CTC ATA GTT CAA 1461 
Lys Arg Gin Arg Ala Phe Phe Leu Phe Gly Leu Ala Leu He Val Gin 
440 445 450 

TTC GAT ACC GAA GGC ATT AGA AGC TTC TTC CGT ACT TTC TTC CGC CTT 1509 
Phe Asp Thr Glu Gly He Arg Ser Phe Phe Arg Thr Phe Phe Arg Leu 
455 460 465 

CCA AAA TGG ATG TGG CAA GGG TTT CTA GGA TCA ACA TTA ACA TCA GGA 1557 
Pro Lys Trp Met Trp Gin Gly Phe Leu Gly Ser Thr Leu Thr Ser Gly 
470 475 480 

GAT CTC .GTT CTC TTT GCT TTA TAC ATG TTC GTC ATT- TCA - CCA AAC AAT 1605 
Asp Leu Val Leu Phe Ala Leu Tyr Met Phe Val He Ser Pro Asn Asn 
485 490 , * . 495 

TTG AGA AAA GGT CTC ATC AAT CAT CTC ATC TCT GAT CCA ACC GGA GCA 16 53 

Leu Arg Lys Gly Leu He Asn His Leu He Ser Asp Pro Thr Gly Ala 
500 505 510 515 

ACC ATG AlTA AAA ACC TAT CTC AAA GTA TGATTTACTT ATCAACTCTT 1700 
Thr Met He Lys Thr Tyr Leu Lys Val 
520 

AGGTTTGTGT ATATATATGT TGATTTATCT GAATAATCGA TCAAAGAATG <3TATGTGGGT 1760 

TACTA6GAAG TTGGAAACAA ACATGTATAG AATCTAAGGA GTGATCGAAA TGGAGATGGA 182 0 

AACGAAAAGA AAAAAATCAG TCTTTGTTTT GTGGTTAGTG 18 60 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 524 amino, acids 

(B) TYPE: amino acid 
(b) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

Met Glu Cys Val Gly Ala Arg Asn Phe Ala Ala Met Ala Val Ser Thr 
1 5 10 15 

Phe Pro Ser Trp Ser Cys Arg Arg Lys Phe Pro Val Val Lys Arg Tyr 
20 25 30 

Ser Tyr Arg Asn He Arg Phe Gly Leu Cys Ser Val Arg Ala Ser <51y 
35 40 45 

Gly Gly Ser Ser Gly Ser Glu Ser Cys Val Ala Val Arg Glu Asp Phe 
50 55 60 
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Ala Asp Glu Glu Asp PJie Val Lys Ala. Gly Gly Ser Glu lie Leu Phe 
65 70 75 80 

Val Gin Met Gin Gin Asn Lys Asp Met Asp Glu Gin Ser Lys Leu Val 

85 90 95 

Asp Lys Leu Pro Pro lie Ser lie Gly Asp Gly Ala Leu Asp His Val 
100 105 110 

Val lie Gly Cys Gly Pro Ala Gly Leu Ala Leu Ala Ala Glu Ser Ala 
115 120 125 

Lys Leu Gly Leu Lys Val Gly Leu lie Gly Pro Asp Leu Pro Phe Thr 
130 ;* ■ 135 140 

Asn Asn Tyr Gly Val Trp Glu Asp Glu Phe Asn Asp Leu Gly Leu Gin 
145 150 ' ' 155* ' 160 

Lys Cys lie Glu His Val Trp Arg Glu Thr lie Val . Tyr Leu Asp Asp 
165 ^ 170 • 175 

Asp Lys Pro lie Thr lie Gly Arg Ala Tyr Gly Arg Val Ser Ar3 Arg 
180 . ^ 185 190 

Leu Leu His Glu Glu Leu Leu Arg Arg Cys Val Glu Ser Gly Val Ser 
195 200 205 

Tyr Leu Ser Ser Lys Val Asp Ser lie Thr .Glu Ala Ser Asp Gly Leu 
210 215 220 

Arg Leu Val Ala Cys Asp Asp Asn Asn Val lie Pro Cys Arg Leu Ala 
225 230 235 240 

Thr Val Ala Ser Gly Ala Ala Ser Gly Lys Leu Leu Gin Tyr Glu Val 

^..245 250 255 

Gly Gly Pro Arg Val Cys Val Gin Thr Ala Tyr Gly Val Glu Val Glu 
260 265 270 

Val Glu Asn Ser Pro Tyr Asp Pro Asp Gin Met Val Phe Met Asp Tyr 
275 280 285 

Arg Asp Tyr Thr Asn Glu Lys Val Arg Ser Leu Glu Ala Glu Tyr Pro 
290 295 300 

Thr Phe Leu Tyr Ala Met Pro Met Thr Lys Ser Arg Leu Phe Phe Glu 
305 310 315 320 

Glu Thr Cys Leu Ala Ser Lys Asp Val Met Pro Phe Asp Leu Leu Lys 
325 330 335 

Thr Lys Leu Met Leu Arg Leu Asp Thr Leu Gly lie Arg lie Leu Lys 
340 345 350 

Thr Tyr Glu Glu Glu Trp Ser Tyr lie Pro Val Gly Gly Ser Leu Pro 
355 360 365 
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A^n Thr GAu Gin Lys Asn Leu Ala Phe Glv Ala Ala Ala Ser Met Val 
370 375 380 

His Pro Ala Thr Gly Tyr Ser Val Val Arg Ser l^u Ser Glu Ala Pro ' 

3^5' 390 395 

Lys Tyr Ala Ser Val lie Ala Glu He Leu Arg Glu Glu Thr Thr Lys 
405 410 415 

Gin He Asn Ser Asn He Ser Arg Gin Ala Trp Asp Thr Leu Trp Pro 
420 . 425 430 

Pro Glu Arg Lys Arg Gin Arg Ala Phe Phe Ley Phe Gly Leu Ala Leu 
""' 435 440 445 

lie Vai;Gln Phe Asp Thr Glu Gly..lle_Arg- .Ser -Phe-Phe Arg Thr Phe 
450 455 4g0 



Phe Arg Leu Pro Lys Trp Met Trp Gin Gly Phe Leu Gly Ser Thr Leu 
''^^ 470 475 • 480 

Thr Ser Gly Asp Leu Val Leu Phe Ala Leu Tyr Met Phe Val He Ser 
485 490 495 

Pro Asn Asn Leu Arg Lys Gly Leu He Asn His Leu He Ser Asp Pro 
500 505 510 

Thr Gly Ala Thr Met He Lys Thr Tyr Leu Lys Val 
515 520 

(2) INFORMATION FOR SEQ ID NO : 3 : 

(i) SEQOTNCE CHARACTERISTICS: 

(A) LENGTH: 956 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

. (ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

GCTCTTTCTC CTCCTCCTCT ACCGATTTCC GACTCCGCCT CCCGAAATCC TTATCCGGAT 60 

TCTCTCCGTC TCTTCGATTT AAACGCTTTT CTGTCTGTTA CGTCGTCGAA GAACGGAGAC 120 

AGAATTCTCC GATTGAGAAC GATGAGA6AC CGGAGAGCAC GAGCTCCACA AACGCTATAG 180 

ACGCTGAGTA TCTGGCGTTG CGTTTGGCGG AGAAATTGGA GAGGAAGAAA TCGGAGAGGT 24 0 

CCACTTATCT AATCGCTGCT ATGTTGTCGA GCTTTGGTAT CACTTCTATG GCTGTTATGG 300 

CTGTTTACTA CAGATTCTCT TGGCAAATGG AGGGAGGTGA GATCTCAATG TTGGAAATGT 36 0 



SUBSTITUTE SHEET (RULE 26) 



wo 97/36998 A .^k PCTAJS9-/00540 









.38 








TTGGTACATT 


TGCTCTCTCT 


GTTGGTGCTG 


CTG TTGGTAT 


GGAATTCTGG 


GCAAGATGGG 


420 


CTCATAGAGC 


TCTGTGGCAC 


GCTTCTCTAT 


GGAATATGCA 


TGAGTCACAT 


CACAAACCAA * 


480 


GAGAAGGACC 


GTTTGAGCTA 


AACGATGTTT 


TTGCTATAGT 


GAACGCTGGT 


CCAGCGATTG 


54 0 


GTCTCCTCTC 


TTATGGATTC 


TTCAATAAAG 


GACTCGTTCC 


TGGT CTCTG C 


TTTGGCGCCG 


6O0 


GGTTAGGCAT 


TIACGGTGTTT 


GGAATCGCCT 


ACATGTTTGT 


CCACGATGGT 


CTCGTGCACA 


660 


AGCGTTTCCC 


TGTAGGTCCC 


ATCGCCGACG 


TCCCTTACCT 


CCGAAAGGTC 


GCCGCCGCTC 


720 


ACCAGCTACA. TCACACAGAC 


AAGTTCAATG 


GTGTACCATA 


TGGACTGTTT 


CTTGGACCCA 


780 


AGGAATTGGA 


AGAAGTTGGA 




AGTTAGATAA 


GGAGATTAGT 


CGGAGAATCA 


840 


AATCATACAA 


AAAGGCCTCG 


GGCTCCGGGT 


CGAGTTCGAG 


TTCTTGACTT 


TAAACAAGTT 


900 


TTAAATCCCA 


AATTCTTTTT 


TTGTCTTCTG 


TCATTATGAT 


CATCTTAAGA 


CGGTCT 


956 



(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE \OHARACTERISTICS; 

(A) .' LENGTH: 294 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4 



Ser Phe S^r Ser Ser Ser Thr Asp Phe Arg Leu Arg Leu Pro Lys Ser 
15 10 15 

Leu Ser Giy Phe Ser Pro Ser Leu Arg Phe Lys Arg Phe Ser Val Cys 
20 25 30 

Tyr Val Val Glu Glu Arg Arg Gin Asn Ser Pro lie Glu Asn Asp Glu 
35 40 45 

Arg Pro Glu Ser Thr Ser Ser Thr Asn Ala He Asp Ala Glu Tyr Leu 
50 55 60 

Ala Leu Arg Leu Ala Glu Lys Leu Glu Arg Lys Lys Ser Glu Arg Ser 
65 70 75 80 

Thr Tyr Leu He Ala Ala Met Leu Ser Ser Phe Gly He Thr Ser Met 
85 90 95 

Ala Val Met Ala Val Tyr Tyr Arg Phe Ser Trp Gin Met Glu Gly Gly 
100 105 110 

Glu He Ser Met Leu Glu Met Phe Gly Thr Phe Ala Leu Ser Val Gly 
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115 12C 125 

Ala Ala Val Gly Met Glu Phe Trp Ala Arg Trp Ala His Arg Ala Leu 
13P 135 140 

Trp His Ala Ser Leu Trp Met Asn His Glu Ser His His Lys Pro Arg 

150 155 160 

Glu Gly Pro Phe Glu Leu Asn Asp Val Phe Ala lie Val Asn Ala Gly 
165 170 175 

Pro Ala lie Gly Leu Leu Ser Tyt Gly Phe Phe Asn Lys Gly Leu Val 
180 185 , 190 

Pro Gly Leu Cys Phe Gly Ala Gly Leu Gly He Thr Val Phe Gly He 
15.5 . . .. 200 . , ^ . 205 • • . 

Ala Tyr Met Phe Val His Asp Gly Leu Val, His Lys Arg Phe Pro VaX 
210 . 215 . 220 

Gly Pro He Ala Asp Val Pro Tyr Leu Arg Lys Val Ala Ala Ala His 
225 230 235 • 240 

« # . * 

Gin Leu His ftis Thr Asp Lys Phe Asn Gly Val Pro Tyr Gly Leu Phe 
245 250 255 

Leu Gly Pro Lys Glu Leu Glu Glu Val Gly Gly Asn Glu Glu Leu Asp 
260 265 • 270 

Lys Glu He Ser Arg Arg He Lys Ser Tyr Lys Lys Ala Ser . Gly Ser 
275 , 280 285 

Gly Ser Ser Ser Ser Ser 
290 

■It • '* 

(2) INFORMATIOli FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A} LENGTH: 162 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



<Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Met Thr Gin Phe Leu He Val Val Ala Thr Val Leu Val Met Glu Leu 
^5 10 15 

Thr Ala Tyr Ser Val His Arg Trp lie Met His Gly Pro Leu Gly Trp 
20 25 30 

Gly Trp His Lys Ser His His Glu Glu His Asp His Ala Leu Glu Lys 
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35 

» 

Asn Asp Leu Tyr 
50 

Thr Val Gly Ala 
65 

Met Thr Val Tyr 



His Gin Arg Trp 
100 

Arg . Leu Tyr Gin 
115 

His Cys Val Ser 
130 

Lys Gin Asp Leu 
145 

Pro 6er 



40 

Gly Val Val Phe 
55 

Tyr Trp Trp Pro 
70 

Gly Leu lie Tyr 
85 

Pro Phe Arg Tyr 



Ala His Arg Leu 
120 

Phe Gly Phe lie 
135 

Lys Arg Ser Gly 
150 



Ala Val Leu Ala 
60 

Val Leu Trp Trp 
75 

Phe lie Leu His 
90 

lie Pro Arg Arg 
105 

His His Ala Val 



Tyr Ala Pro Pro 
140 

Val Leu Arg Pro 
155 



45 

Thr lie Leu Phe 



lie Ala Leu Gly 
80 

Asp Gly Leu Val 
95 

Gly Tyr Phe Arg 
110 

Glu Gly Arg Asp 

125 

Val Asp Lys Leu 



Gin Asp Glu Arg 
160 



) INFORMATION FOR SEO ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ 11 

Met Leu Asn Ser Leu lie Val lie 

1 5 

lie Ala Ala Phe Thr His Arg Tyr 
20 

Trp His Glu Ser His His Thr Pro 
35 40 

Asp Leu Phe Ala Val Val Phe Ala 
50 55 

Val Gly Thr Ala Gly Val Trp Pro 
65 70 

Thr Val Tyr Gly Leu Leu Tyr Phe 



► NO : 6 : 

Leu Ser Val He Ala Met Glu Gly 
10 15 

lie Met His Gly Trp Gly Trp Arg 
25 30 

Arg Lys Gly Val Phe Glu Leu Asn 
45 

Gly Val Ala He Ala Leu He Ala 
60 

Leu Gin Trp He Gly Cys Gly Met 
75 80 

Leu Val His Asp Gly Leu Val His 
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Gin Arg Trp Pro Phe His Trp lie Pro Arg Arg Gly Tyr Leu Lys Arg ^ 
100 105 110 \ 

Leu Tyr Val Ala His Arg Leu His His Ala Val Arg Gly Arg Glu Gly 
115 120 125 

» 

Cys Val Ser Phe Gly Phe He Tyr Ala Arg Lys Pro Ala Asp Leu Gin 
130 135 ^40 

Ala He Leu Arg Glu Arg His Gly Arg Pro Pro Lys Arg Asp Ala Ala 
1-^5 150 . 155 160 

Lys Asp Arg Pro Asp Ala Ala Ser Pro Ser Ser Ser Ser Pro Glu 

165_._ _.1_7_C .. . .. 175 

(2) INFORMATION FOR SEO ID NO: 7:, 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 175 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 
• (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

Met Leu Trp He Trp Asn Ala Leu He Val Phe Val Thr Val He Gly 
1 5 10 15 

Met Glu Val He Ala Ala Leu Ala His Lys Tyr He Met His Gly Trp 
20 25 30 

Gly Trp Gly Trp His Leu Ser His His Glu Pro Arg Lys Gly Ala Phe 
35 40 45 

Glu Val Asn Asp Leu Tyr Ala Val Val Phe Ala Ala Leu Ser He Leu 
50 55 60 

Leu He Tyr Leu Gly Ser Thr Gly Met Trp Pro Leu Gin Trp He Gly 

70 75 80 

Ala Gly Met Thr Ala Tyr Gly Leu Leu Tyr Phe Met Val His Asp Gly 
85 90 95 

Leu Val His Gin Arg Trp Pro Phe Arg Tyr He Pro Arg. Lys Gly Tyr 
100 105 110 

Leu Lys Arg Leu Tyr Met Ala His Arg Met His His Ala Val Arg Gly 
115 120 125 



Lys Glu Gly Cys VaJ Ser Phe Glv Phe Leu Tvr Ala Pro 



Pro Leu Ser 
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130 135 140 

Lys Leu Gin Ala Thr Leu Arg Glu Arg His Gly Ala Arg Ala Gly Ala 
145 150 155 160 

Ala Arg Asp Ala Gin Gly Gly Glu Asp Glu Pro Ala Ser Gly Lys 
165 170 175 

(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 162 amino acidi^ 
(B.) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

Met Thr Asn Phe Leu He Val Val Ala Thr Val Leu Val Met Glu Leu 
15 10 15 

Thr Ala Tyr Ser Val His Arg Trp He Met His Gly Pro Leu Gly Trp 
20 25 • . 30 

Gly Trp His Lys Ser His His Glu Glu His Asp His Ala Leu Glu Lys 
35 40 45 

Asn Asp Leu Tyr Gly Leu Val Phe Ala Val He Ala Thr Val Leu Phe 
50 55 60 

Thr Val Gly Trp He Trp Ala Pro Val Leu Trp Trp He Ala Leu Gly 
65 70 75 80 

Met Thr Val Tyr Gly Leu He Tyr Phe Val Leu His Asp Gly Leu Val 
85 90 95 

His Trp Arg Trp Pro Phe Arg Tyr He Pro Arg Lys Gly Tyr Ala Arg 
100 105 HO 

Arg Leu Tyr Gin Ala His Arg Leu His His Ala Val Glu Gly Arg Asp 
lis 120 125 

His Cys Val Ser Phe Gly Phe He Tyr Ala Pro Pro Val Asp Lys Leu 
130 135 140 

Lys Gin Asp Leu Lys Met Ser Gly Val Leu Arg Ala Glu Ala Gin Glu 
145 150 155 160 

Arg Thr 



(2) INFORMATION FOR SEQ ID NO : 9 : 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 954 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
(D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 



CCACGGGTCC'' GCCTCCCCGT TTTTTTCCGA 


TCCGATCTCC 


GGTGCCGAGG 


ACTCAGCTGT 


60 




TTTCTCAGCC, GTCACCATGA CCGATTCTAA 


CGATGCTGGA 


ATGGATGCTG . 


12'0 


TTCAGAGACC^ 


i i w> 1 1 i oAAGACGAAT 


gcattctcgt 


.TGATGAAAAT 


AATCGTGTGG 


180 




^- AL. i >^Av9 i ^ J AACTGTCATC 


tgatggaAaa 


GATTGT^GCT 


GAGAATTTAC 


240 


f TV r* A n 21 


TTTCAGTGTG TTTTTATTCA. 


ACTCCAAGTA 


TGAGTT<5CTT 


CTCCAGCAAC 


300 


fsrj T r* 2v n A TV TV 


AAA^GGTtXcT TTCCCACTTG 


TGTGGACAAA 


CACTTGTTGC 


AGCCATCCTC ' 


360 


TTTACCGTGA 


ATCCGAGCTT ATTGAAGAGA 


ATGTGCTTGG 


TGTAAGAAAT 


GCCGCACAAA 


420 


GGAAGCTTTT 


CGATGAGCTC GGTATTGTAG 


CAGAAGATGT 


ACCAGTCGAT 


GAGTTCACTC 


480 


CCTTGGGACG 


CATGCTTTAC AAGGCACCTT 


CTGATGGGAA 


ATGGGGA<5AG 


CACGAAGTTG 


540 


ACTATCTACT 


CTTCATCGTG CGGGATGTGA 


AGCTTCAACC 


AAACCCAGAT 


G7^<3TGGCTG 


600 


AGATCAAGTA 


CGTGAGCAGG GAAGAGCTTA 


AGGAGCTGGT 


GAAGAAAGCA 


GATGCTGGCG 


660 


ATGAAGCTGT 


gaaXctatct ccatggttca 


GATTGGTGGT 


GGATAATTTC 


TTGATGAAGT 


720 


GGTGGGATCA 


TGTTGAGAAA GGAACTATCA 


CTGAAGCTGC 


AGACATGAAA 


ACCATTCACA 


780 


AGCTCTGAAC 


TTTCCATAAG TTTTGGATCT 


TCCCCTTCCC 


ATAATAAAAT 


TAAGAGATGA 


840 


GACTTTTATT 


GATTACAGAC AAAACTGGCA 


AC7LAAATCTA 


TTCCTAGGAT 


TTTTTTTTGC 


900 


TTTTTATTTA 


CTTTTGATTC ATCTCTAGTT 


TAGTTTTCAT 


CTTAAAAAAA 


AAAA 


954 


(2) INFORMATION FOR SEQ ID NO: 10: 











(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 996 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY : 1 inear 



(ii> MOLECULE TYPE: cDNA 
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. (xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: • 

CACCAATGTC TGTTTCTTCT TTATTTAATC TCCCATTGAT TCGCCTCAGA TCTCTCGCTC 6 0 

♦ 

TTTCGTCTTC TTTTTCTTCT TTCCGATTTG CCCATCGTCC TCTGTCATCG ATTTCACCGA 120 

GAAAGTTACC GAATTTTCGT GCTTTCTCTG GTACCGCTAT GACAGATACT AAAGATGCTG 180 

GTATGGATGC TGTTCAGAGA CGTCTCATGT TTGAGGATGA ATGCATTCTT GTTGATGAAA 24 0 

CTGATCGTGT TGTGGGGCAT GTCAGCAAGT ATAATTGTCA TCTGATGGAA AATATTGAAG 3 00 

CCAAGAATTT GCTGCACAGG GCTTTTAGTG TATTTTTATT CAACTCGAAG TATGAGTTGC 36 0 

TTCTCCAGCA AAGGTCAAAC ACAAAGGTTA CGTTCCCTCT AGTGTGGACT AACACTTGTT 4 20 

GCAGCCATCC TCTTTACCGT GAATCAGAGC TTATCCAGGA CAATGCACTA GGTGTGAGGA 4 80 

ATGCTGCACA AAGAAAGCTT CTCGATGAGC TTGGTATTGT AGCTGAAGAT GTACCAGTCG 54 0 

ATGAGTTCAC TCCCTTGGGA CGTATGCTGT ACAAGGCTCC TTCTGATGGC AAATGGGGAG 6 00 

AGCATGAACT TGATTACTTG CTCTTCATCG TGCGAGACGT GAAGGTTCAA CCAAACCCAG 6 60 

ATGAAGTAGC TGAGATCAAG TATGTGAGCC GGGAAGAGCT GAAGGAGCTG GTGAAGAAAG 72 O 

CAGATGCAGG TGAGGAAGGT TTGAAACTGT CACCATGGTT CAGATTGGTG GTGGACAATT 7 80 

TCTTGATGAA GTGGTGGGAT CATGTTGAGA AAGGAACTTT GGTTGAAGCT ATAGACATGA 84 0 

AAACCATCCA CAAACTCTGA ACATCTTTTT TTAAAGTTTT TAAATCAATC AACTTTCTCT 900 

TCATCATTTT TATCTTTTCG ATGATAATAA TTTGGGATAT GTGAGACACT TACAAAACTT 96 0 

CCAAGCACCT CAGGCAATAA TAAAGTTTGC GGCCGC 9 96 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1165 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 Inear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CTCGGTAGCT GGCCACAATC GCTATTTGGA ACCTGGCCCG GCGGCAGTCC GATGCCGCGA 
TGCTTCGTTC GTTGCTCAGA GGCCTCACGC ATATCCCCCG CGTGAACTCC GCCCAGCAGC 
CCAGCTGTGC ACACGCGCGA CTCCAGTTTA AGCTCAGGAG CATGCAGATG ACGCTCATGC 
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AGCCCAGCAT CTCAGCCAAT 


CTGTCGCGCG 


CCGAGGACCG 


CACAGACCAC 


ATGAGGGGTG 


240 


CAAGCACCTG GGCAGGCGGG 


CAGTCGCAGG 


ATGAGCTGAT 


GCTGAAGGAC 


GAGTGCATCT ' 


300 


TGGTGGATGT TGAGGACAAC 


ATCACAGGCC 


ATGCCAGCAA 


GCTGGAGTGT 


CACAAGTTCC 


360 


TACCACATCA GCCTGCAGGC 


CTGCTGCACC 


GGGCCTTCTC 


TGTGTTCCTG 


TTTGAGGATC 


420 


AGGGGCGACT GCTGCTGCAA 


CAGCGTGCAC 


GCTCAAAAAT 


CACCTTCCCA 


AGTGTGTGGA 


480 


CGAACACCTG CTGCAGCCAC CCTTTACATG 


GGCAGACCCC 

4 


AGATGAGGTG 


GACCAACT7VA 


540 


GCCAGGTGGC CGACGGAACA 


GTACCTGGCG 


CAAAGGCTGC 


TGCCATCCGC 


AAGTTGGAGC 


600 


ACGAGCTGGG GAXACCAGCG 


CACCAGCTGC 


CGGCAAGCGC 


GTTTCGCTTC 


CTCACGCGTT 


660 


TGCACTACTG TGCCGCGGAC 


GTGCAGCCAG 


CTGCGACACA 


ATCAGCGCTC 


TGGGGCGAGC 


720 


ACGAAATGGA CTACATCTTG 


TTCATCCGGG 


CCAACGTCAC 


CTTGGCGCCC 


AACCCfGACG 


780 


AGGTGGACGA AGTCAGGTAC 


GTGACGCAAG 


AGGAGCTGCG 


GCAGATGATG 


CAGCCGGACA 


840 


ACGGGCTGCA ATGGTCQCpG 
* • • 


TGGTTTCGCA 


TCATCGCCGC 


GCGCTTCCTT 


GAGGGTTGGT 


900 


GGGCTGACCT GGACGCGGCC 


CTAAACACTG 


ACAAACACGA 


GGATTGGGGA 


ACGGTGCATC 


960 


ACATCAACGA AGCGTGAAAG 


CAGAAGCTGC 


AGGATGTGAA 


GACACGTCAT 


GGGGTGGAAT 


1020 


TGCGTACTTG GCAGCTTCGT 


ATCTCCTTTT 


TCTGAGACTG 


AACCTGCAGT 




iUo 0 


AAGGTCAGGT AAAATGGCTC 


GATAAAATGT 


ACCGTCACTT 


TTTGTCGCGT 


ATACTGAACT 


1140 


CCAAGAGGTC AAAAAAAAAA 


AAAAA 








1165 


(2) INFORMATIQJ3; ..FOR SEQ ID NO: 12: 











(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 113 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: cDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID.NO:12: 

CTCGGTAGCT GGCCACAATC GCTATTTGGA ACCTGGCCCG GGGGCAGTCC GATGCCGGGA 6 0 

TGCTTCGTTC GTTGCTCAGA GGCCTCACGC ATATCCCGCG CGTGAACTCC GCCCAGCAGC 120 

CCAGCTGTGC ACACGCGCGA CTCCAGTTTA AGCTCAGGAG CATGCAGCTG CTTTCCGAGG 180 

ACCGCACAGA CCACATGAGG GGTGCAAGCA CCTGGGCAGG CGGGCAGTCG CAGGATGAGC 24 0 
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TGATGCTGAA .GGACGAGTGC ATCTTGGTAG ATGTTGAGGA CAACATCACA GGCCATGCCA 300 

GCAAGCTGGA GTGTCACAAG TTCCTACCAC ATCAGCCTGC AGGCCTGCTG CACCGGGCCT 360 

TCTCTGTGTT CCTGTTTGAC GATCAGGGGC GACTGCTGCT GCAACAGCGT GCACGCTCAA 42 0 

AAATCACCTT CCCAAGTGTG TGGACGAACA CCTGCTGCAG CCACCCTTTA CATGGGCAGA 48 0 

CCCCAGATGA GGTGGACCAA CTAAGCCAGG TGGCCGACGG AACAGTACCT GGCGCAAAGG 54 0 

CTGCTGCCAT CCGCAAGTTG GAGCACGAGC TGGGGATACC AGCGCACCAG CTGCCGGCAA 6 00 

GCGCGTTTCG CTTCCTCACG CGTTTGCACT ACTGTGCCGC GGACGTGCAG CCAGCTGCGA 66 0 

CACAATCAGC GCTCTGGGGC GAGCACGAAA TGGACTACAT CTTGTTCATC CGGGCCAACG 72 0 

TCACCTTGGC GCCCAACCCT GACGAGGTGG ACGAAGTCAG GTACGTGACG CAAGAGGAGC 78 0 

TGCGGCAGAT GATGCAGCCG GACAACGGGC TTCAATGGTC GCCGTGGTTT CGCATCATCG 84 0 

CCGCGCGCTT CCTTGAGCGT TGGTGGGCTG ACCTGGACGC GGCCCTAAAC ACTGACAAAC 90 0 

ACGAGGATTG GGG7VACGGTG CATCACATCA ACGAAGCGTG AAGGCAGAAG CTGCAGGATG 960 

TGAAGACACG TCATGGGGTG GAATTGCGTA CTTGGCAGCT TCGTATCTCC TTTTTCTGAG 102 0 

ACTGAACCTG CAGAGCTAGA GTCAATGGTG CATCATATTC ATCGTCTCTC TTTTGTTTTA 1O8 0 

GACTAATCTG TAGCTAGAGT CACTGATGAA TCCTTTACAA CTTTCAAAAA AAAAA 113 5 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 960 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

CCAAAAACAA CTCAAATCTC CTCCGTCGCT CTTACTCCGC CATGGGTGAC GACTCCGGCA 6 0 

TGGATGCTGT TCAGCGACGT CTCATGTTTG ACGATGAATG CATTTTGGTG GATGAGTGTG 12 0 

ACAATGTGGT GGGACATGAT ACCAAATACA ATTGTCACTT GATGGAGAAG ATTGAAACAG 180 

GTAAAATGCT GCACAGAGCA TTCAGCGTTT TTCTATTCAA TTCAAAATAC GAGTTACTTC 24 0 

TTCAGCAACG GTCTGCAACC AAGGTGACAT TTCCTTTAGT ATGGACCAAC ACCTGTTGCA 300 

GCCATCCACT CTACAGAGAA TCCGAGCTTG TTCCCGAAAC GCCTGAGAGA ATGCTGCACA 360 



SUBSTTTUTE SHEET (RULE 26) 



wo 97/36998 PCT/US97/OOS40 

47 

GAGGANNI^JNN NNNNNNNNNN NNNIONNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNliNNN 420 

NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 4^0 

NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN- NNNNNNNNNN 540 

NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 6 00 

NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN. NNNNNNNIWN NNNNNNNNNN - 660 

NNNNNNNNNN NNNNNNNNIW TCATGTGCAA AAGGGTACAc TCACTGAATG CT^TTTGATA 720 

TGAAAACCAT ACACAAGCTG ATATAGAAAC ACACCCTCAA CCGAAAAGCA AGCCTAATAA 76 0 

TTCGGGTTGG GTCGGGTCTA CCATCAATTG XTTTTTTCTT TTAACAACTT TTAATCTCTA 84 0 

TTTGAGCATG TTGATTCTTG TCTTTTGTGT GTAAGATTTT GGGTTTCGTT TCAGTTGTAA 900 

TAATGAACCA TTGATGGTTT GC7UVTTTCAA GTTCCTATCG ACATGTAGTG ATCTAAAAAA 960 . 

(2) INFORMATION FOR SEQ ID NO : 14 : 

(i) Sequence characteristics: 

(A) LENGTH: 305 amino acids 

<B) TYPE: amino acid 

(C) STRANDEDNESS ; single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(Xi;) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Leu Arg Ser Leu Leu Arg Gly Leu Thr His lie Pro Arg Val Asn 
1 " 5 10 15 

Ser Ala Gin Gin Pro Ser Cys Ala His Ala Arg Leu Gin Phe Lys Leu 

20 25 30 

Arg Ser Met Gin Met Thr Leu Met Gin Pro Ser lie Ser Ala Asn Leu 
3 5 4 0 4 5 

Ser Arg Ala Glu Asp Arg Thr Asp His Met Arg Gly Ala Ser Thr Trp 
50 55 60 

Ala Gly Gly Gin Ser Gin Asp Glu Leu Met Leu Lys Asp Glu Cys lie 
65 70 75 80 

Leu Val Asp Val Glu Asp Asn lie Thr Gly His Ala Ser Lys Leu Glu 

85- 90 95 

Cys His Lys Phe Leu Pro His Gin Pro Ala Gly Leu Leu His Arg Ala 

100 105 110 



SUBSTITUTE SHEET (RULE 26) 



PCTAJS97/00540 

48 

Phe Ser Val Phe Leu Phe Asp Asp Gin Gly Arg Leu Leu Leu Gin Gin 
115 120 125 

Arg Ala Arg Ser Lys lie Thr Phe Pro Ser Val Trp Thr Asn Thr Cys 
130 135 140 

Cys Ser His Pro Leu His Gly Gin Thr Pro Asp Glu Val Asp Gin Leu 
145 150 155 160 

Ser Gin Val Ala Asp Gly Thr Val Pro Gly Ala Lys Ala Ala Ala lie 
165 170 175 

Arg Lys Leu Glu His Glu Leu Gly lie Pro Ala His Gin Leu Pro Ala 
* . 180 185 190 

Ser Ala Phe Arg Phe Leu Thr Arg Leu His Tyr Cys Ala Ala Asp Val 
195 ' 200 ' . ' 205 

Gin Pro Ala Ala Thr Gin Ser Ala Leu Trp Gly Glu His Glu Met Asp 
210 215 . " 220 

Tyr lie Leu Phe lie Arg Ala Asn Val Thr Leu Ala Pro Asn Pro Asp 
225 230 235 240 

•. ' * 

Glu Val Asp Glu Val Arg Tyr Val Thr Gin Glu Glu Leu Arg Gin Met 
245 250 255 

Met Gin Pro Asp Asn Gly Leu Gin Trp Ser Pro Trp Phe Arg lie lie 
260 265* 270 

Ala Ala Arg Phe Leu Glu Arg Trp Trp Ala Asp Leu Asp Ala Ala Leu 

275 ■ 280 285 

Asn Thr Asp Lys His Glu Asp Trp Gly Thr Val His His lie Asn Glu 
290 295 300 

Ala 
305 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 293 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



ixi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Leu Arg Ser Leu Leu Arg Gly Leu Thr His lie Pro Arg Val Asn 
15 10 15 
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Ser Ala Gin Gin- Pro Ser Cys Ala His Ala Arg Leu Gin Phe Lys Leu 

20 25 30 ■ 

Arg Ser Met Gin Leu Leu Ser Glu A^p Arg Thr Asp His Met Arg Gly 
35 40 45 

Ala Ser Thr Trp Ala Gly Gly Gin Ser Gin Asp Glu Leu Met Leu Lys 
50 55 60 

Asp Glu Cys lie Leu Val Asp Val Glu Asp Asn lie Thr Gly His Ala 
. 70 75 80 

Ser Lys Leu Glu Cys His Lys Phe Leu Pro His Gin Pro Ala Gly Leu 
' " ®^ 90 95 

- Leu His Arg Ala Phe Ser Val Phe.. L^u . Phe Asp Asp .Gin -Gl-y- Arg- Leu 
100 105 110 

Leu Leu Gin Gin Arg Ala Arg Ser Lys lie' Thr Phe Pro Ser Val Trp 

115 ^ 120 • . 125 

Thr Asn Thr Cys Cys Ser His Pro Leu His Gly Gin Thr Pro Asp <31u 
130 , 135 . 140 

Val Asp Gin Leu Ser Gin Val Ala Asp Gly Thr Val Pro Gly Ala Lys 

150 . 155 . . 160 

Ala Ala Ala lie Arg Lys Leu Glu His- Glu Leu Gly lie Pro Ala His 
165 170 175 

Gin Leu Pro Ala Ser Ala Phe, Arg Phe Leu Thr Arg Leu His Tyr Cys 
180 185 190 

Ala Ala Asp Val Gin Pro Ala Ala Thr Gin - Ser Ala Leu Trp Gly Glu 
1?,5.. 200 205 

His Glu Met' Asp Tyr He Leu Phe He Arg Ala Asn Val Thr Leu Ala 
210 215 220 

Pro Asn Pro Asp Glu Val Asp Glu Val Arg Tyr Val Thr Gin Glu Glu 
225 230 235 240 

Leu Arg Gin Met Met Gin Pro Asp Asn Gly Leu Gin Trp Ser Pro Trp 
245 250 255 

Phe Arg He He Ala Ala Arg Phe Leu Glu Arg Trp Trp Ala Asp Leu 
260 265 270 

Asp Ala Ala Leu Asn Thr Asp Lys His Glu Asp Trp Gly Thr Val His 
275 280 285 

His He Asn Glu Ala 
290 

(2) INFORMATION FOR SEQ ID NO: 16: 
(i) SEQUENCE CHARACTERISTICS: 
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♦ (A) LENGTH: 284 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Met Ser Val Ser Ser Leu Phe Asn Leu Pro Leu lie Arg Leu Arg Ser 
* 1 ' 5 10 ' 15 

Leu Ala Leu Ser Ser Ser Phe Ser Ser Phe Arg Phe Ala His Arg Pro 
20 25 30 

Leu Ser Ser lie Ser Pro Arg Lys Leu Pro Asn Phe Arg Ala Phe Ser 
35 40 45 

Gly Thr Ala Met Thr Asp Thr Lys Asp Ala Gly Met Asp Ala Val Gin. 
50 55 60 

Arg Arg Leu Met Phe Glu Asp Glu Cys He Leu Val Asp Glu Thr Asp 
65 70 75 80 

Arg Val Val Gly His Val Ser Lys Tyr Asn Cys His Leu Met Glu Asn 
85 90 95 

He Glu Ala Lys Asn Leu Leu His Arg Ala Phe Ser Val Phe Leu Phe 
100 105 110 

Asn Ser Lys Tyr Glu Leu Leu Leu Gin Gin Arg Ser Asn Thr Lys Val 
115 120 125 

Thr Phe Pro Leu Val Trp Thr Asn Thr Cys Cys Ser His Pro Leu Tyr 
130 135 140 

Arg Glu ser Glu Leu He Gin Asp Asn Ala Leu Gly Val Arg Asn Ala 
145 150 155 160 

Ala Gin Arg Lys Leu Leu Asp Glu Leu Gly He Val Ala Glu Asp Val 
165 170 175 

Pro Val Asp Glu Phe Thr Pro Leu Gly Arg Met Leu Tyr Lys Ala Pro 
180 185 190 

Ser Asp Gly Lys Trp Gly Glu His Glu Leu Asp Tyr Leu Leu Phe He 
195 200 205 

Val Ara Asp Val Lys Val Gin Pro Asn Pro Asp Glu Val Ala Glu He 
210 215 220 

Lys Tyr Val Ser Arg Glu Glu Leu Lys Glu Leu Val Lys Lys Ala Asp 
225 230 235 240 



suBsrrmTE sheet (rule 26) 



wo 97/36998 




PCT/US97/a0540 



Ala Qly Glu Glu Gly Leu Lys Leu Ser Pro Trp Phe Arg Leu Val Val 
245 250 255 

Asp Asn Phe Leu Met Lys Trp Trp Asp His Val Glu Lys Gly Thr Leu 
260 265 270 

val Glu Ala lie Asp Met Lys Thr He His Lys Leu 
275 280 

INFORMATION FOR SEQ ID NO: 17: 

.(i) SEQUENCE CHARACTERISTICS-: 

(A) LENGTH: 287 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

TOPOLOGY: linear . 



(ii> MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Met Ser Ser Ser Met Leu Asn Phe Thr Ala Ser Arg He Val Ser Leu 
^ 5 10 15 

Pro Leu Leu Ser Ser Pro Pro Ser Arg Val His Leu Pro Leu Cys Phe 
20 25 30 

Phe Ser Pro He Ser Leu Thr Gin Arg Phe Ser Ala Lys Leu Thr Phe 
35 40 45 

Ser Ser Gin Ala Thr Thr Met Gly Glu Val Val Asp Ala Gly Met Asp 
.. 50 55 60 

Ala Val Gin Arg Arg Leu Met Phe Glu Asp Glu Cvs He Leu Val Asp 
^ 70 75 ^ 80 

Glu Asn Asp Lys Val Val Gly His Glu Ser Lys Tyr Asn Cys His Leu 
85 .90 95 

Met Glu Lys lie Glu Ser Glu Asn Leu Leu His Arg Ala Phe Ser Val 
100 105 110 

Phe Leu Phe Asn Ser Lys Tyr Glu Leu Leu Leu Gin Gin Arg Ser Ala 
il5 120 125 

Thr Lys Val Thr Phe Pro Leu Val Trp Thr Asn Thr Cys Cys Ser His 
130 135 140 

Pro Leu Tyr Arg Glu Ser Glu Leu lie Asp Glu Asn Cys Leu Gly Val 

150 155 160 

Arg Asn Ala Ala Gin Arg Lys Leu Leu Asp Glu Leu Gly lie Pro Ala 
165 170 175 
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Glu Asp Leu Pro 
180 

Lys Ala Pro Ser 
195 

Leu Phe lie He 
210 



52 

Val Asp Gin Phe He 
185 

Asp Gly Lys Trp Gly 
200 

Arg Asp Val Asn Leu 
215 




Pro Leu Ser Arg 

Glu His Glu Leu 
205 

Asp Pro Asn Pro 
220 



PCT/US97/00540 

He Leu Tyr 
190 

Asp Tyr Leu 
Asp Glu Val 



Ala Glu Val Lys Tyr Met Asn Arg Asp Asp Leu Lys Glu Leu Leu Arg 
225 230 235 240 

Lys Ala Asp Ala Glu Glu Glu Gly Val Lys Leu Ser Pro Trp Phe Arg 
; ' 245 250 255 

Leu Val Val. Asp Asri. Phe Leu Phe Lys Trp Jrp Asp His Val Glu , Lys 
260 265 270 

Gly Ser Leu Lys Asp Ala Ala Asp Met Lys Thr lie His Lys Leu 
275 , 280 • 285 

INFORMATION FOR SEQ ID . NO: 18: 

(i) SEQUEi^CE CHARACTERISTICS: 

(A) LENGTH: 261 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Thr Gly Pro Pro Pro Arg Phe Phe Pro He Arg Ser Pro Val Pro Arg 
15 10 15 

Thr Gin Leu Phe Val Arg Ala Phe Ser Ala Val Thr Met Thr Asp Ser 
20 25 30 

Asn Asp Ala Gly Met Asp Ala Val Gin Arg Arg Leu Met Phe Glu Asp 
35 40 45 

Glu Cys He Leu Val Asp Glu Asn Asn Arg Val Val Gly His Asp Thr 
50 55 60 

Lys Tyr Asn Cys His Leu Met Glu Lys He Glu Ala Glu Asn Leu Leu 
65 70 75 80 

His Arg Ala Phe Ser Val Phe Leu Phe Asn Ser Lys Tyr Glu Leu Leu 
85 90 95 

Leu Gin Gin Arg Ser Lys Thr Lys Val Thr Phe Pro Leu Val Trp Thr 
100 105 110 
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Asn Xhr Cys Cys Ser His Pro Leu Tyr Arg Glu Ser Glu Leu lie Glu 

3-15 120 125 

Glu Asn Val Leu Gly Val Arg Asn Ala Ala iGln Arg Lys Leu Phe Asp * 
130 135 

Glu Leu Gly lie Val Ala Glu Asp Val Pro Val Asp Glu Phe Thr Pro * 

150 . 155 • 160 

Leu Gly Arg Met Leu Tyr Lys Ala Pro Ser Asp Gly Lys Trp Gly gIu 
1€5 170 175 

His Glu Val Asp Tyr Leu Leu Phe He V^l Arg Asp Val Lys Leu Gin 
180 185 190 

Pro Asn Pro. Asp Glu Vai Ala .Glu _I.le„Lys Tyr Val Ser Arg Glu Glu 
155 200 205 

Leu Lys Glu Leu Val Lys Lys Ala Asp Ala Gly Asp Glu Ala Val Lys 
210 215 220 

Leu Ser Pro Trp Phe Arg Leu Val Val Asp Asn Phe Leu Met Lys Trp 
22.5 230 235 240 

Trp Asp His Val Glu Lys Gly Thr He Thr Glu Ala Ala Asp Met Lys 
245 250 255 

Thr lie His Lys Leu 
260 

INFOimATION FOR SEQ ID NO: IS: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 288 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Met Thr Ala Asp Asn Asn Ser Met Pro His Gly Ala Val Ser Ser Tyr 
1 5 10 15 

Ala Lys Leu Val Gin Asn Gin Thr Pro Glu Asp lie Leu Glu Glu Phe 
20 25 30 

Pro Glu lie He Pro Leu Gin Gin Arg Pro Asn Thr Arg Ser Ser ^lu 
35 40 45 

Thr Ser Asn Asp Glu Ser Gly Glu Thr Cys Phe Ser Gly His Asp Glu 
^0 55 go 
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Glu Gin lie Lys Leu Met Asn Glu Asn Cys lie Val Leu Asp Trp Asp 

65 , 70 75 80 . 

Asp Asn Ala lie Gly Ala Gly Thr Lys Lys Val Cys His Leu Met Glu 
85 90 95 

Asn lie Glu Lys Gly Leu Leu His Arg Ala Phe Ser Val Phe lie Phe 
100 105 110 

Asn Glu Gin Gly Glu Leu Leu Leu Gin Gin Arg Ala Thr Glu Lys lie 
115 120 125 

Thr Phe Pro Asp Leu Trp Thr Asn Thr Cys Cys Ser His Pro Leu Cys 
130 135 ' 140 

lie Asp Asp Glu Ley Gly Leu Lys G^y. Lys ,Leu Asp Asp Lys lie Lys 
145 150 155 160 

Gly Ala He Thr Ala Ala Val Arg Lys Leu Asp His Glu Leu Gly He 
. 165 ' 170 175 

Pro Glu Asp Glu Thr Lys Thr Arg Gly Lys Phe His Phe l^eu Asn Arg 
16Q 1&5 190 

He His Tyr Met Ala Pro Ser Asn Glu Pro Trp Gly Glu His Glu He 
195 200 205 

Asp Tyr He Leu Phe Tyr Lys He Asn Ala J^ys Glu Asn Leu Thr Val 
210 215 220 

Asn Pro Asn Val Asn Glu Val Arg Asp Phe Lys Trp Val Ser Pro Asn 
225 230 235 240 

Asp Leu Lys Thr Met Phe Ala Asp Pro Ser Tyr Lys Phe Thr Pro Trp 
245 250 255 

Phe Lys He He Cys Glu Asn Tyr Leu Phe Asn Trp Trp Glu Gin Leu 
260 265 270 

Asp Asp Leu Ser Glu Val Glu Asn Asp Arg Gin He His Arg Met Leu 
275 280 285 



) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 56 amino acids 

(B) TYPE: amino acid 

iC) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
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Met Asp Thr Leu . Leu Lys Thr Pro Asn Leu Glu Phe Leu Pro His Gly 
^ . ^ 10 15 

Phe Val Lys Ser Phe Ser Lys Phe Gly Lys Cys Glu Gly Val Cys Val 
20 25 * . 30 

Lys Ser Ser Ala Leu Leu Glu Leu Val Pro Glu Thr Lys Lys Glu Asn 
35 40 45 

Leu Asp Phe Glu Leu Pro Met Tyr Asp Pro Ser Lys Gly Val Val Asp 
SO - 55 60 

Leu Ala Val Val Gly Gly Gly Pro Ala Gly Leu Ala Val Ala Gin Gin 
^S " * 70 75 80 

Val Ser Glu Ala Gly Leu Ser . Val _ Cys . Ser .lie Asp. Pro -Pro Lys Leu 
85 90' 95 

lie Trp Pro Asn Asn Tyr Gly Val Trp Val' Asp Glu Phe Glu Ala Met 
100 105 ' 110 

Asp Leu Leu Asp Cys Leu Asp Ala Thr Trp Ser Gly Ala Val Tyr lie 
, , 120 . 125 

Asp Asp thr Lys Asp Leu Arg Pro Tyr Gly Arg Val Asn Arg Lys Gin 
130 135 

Leu Lys Ser Lys Met Met Gin Lys Cys lie Asn Gly Val Lys Phe His 

150 155 160 

Gin Ala Lys Val He Lys Val He His Glu Glu Lys Ser Met Leu lie 
165 170 175 

Cys Asn Asp Gly Thr He Gin Ala Thr Val Val Leu Asp Ala Thr Gly 
. 180 185 190 

Phe Ser Arg Leu Val -Gin Tyr Asp Lys Pro Tyr Asn Pro Gly Tyr Gin 
195 200 205 

Val Ala Tyr Gly He Leu Ala Glu Val Glu Glu His Pro Phe Asp Lys 
210 215 220 

Met Val Phe Met Asp Trp Arg Asp Ser His Leu Asn Asn Glu Leu Lys 
225 230 235 240 

Glu Arg Asn Ser He Pro Thr Phe Leu Tyr Ala Met Pro Phe Ser Ser 
245 250 255 

Asn Arg He Phe Leu Glu Glu Thr Ser Leu Val Ala Arg Pro Gly Leu 
260 265 270 

Arg Met Asp Asp He Gin Glu Arg Met Val Ala Arg Leu His Leu Gly 
275 280 285 

He Lys Val Lys Ser He Glu Glu Asp Glu His Cys Val He Pro Met 
290 295 300 
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» Gly G^y Pro Leu Pro Val Leu Pro Gin Arg Val Val Gly lie Gly Gly 

305 ' 310 315 320 

Thr Ala Gly Met Val His Pro Ser Thr Gly Tyr Met Val Ala Arg Thr • 

325 330 335 

Leu Ala Ala Ala Pro Val Val Ala Asn Ala lie lie Tyr Leu Gly Ser 
340 345 350 

Glu Ser Ser Gly Glu Leu Ser Ala Glu Val Trp Lys Asp Leu Trp Pro 
355 360 365 

He Glu Arg Arg Arg Gin Arg Glu Phe Phe Cys Phe Gly Met Asp He 
' 370 375 380 

Leu Leu Lys Leu Asp Leu Pro Ala Thr Arg Arg Phe Phe Asp Ala Phe 
385 390 395 400 

Phe Asp Leu Glu Pro Arg Tyr' Trp His Gly Phe Leu Ser Ser Arg Leu 
405 410 415 

Phe Leu Pro Glu Leu He Val Phe Gly Leu Ser Leu Phe Ser His Ala 
420 425 430 

Ser Asn Thr Ser Arg Glu He Met Thr Lys* Gly Thr Pro Leu Val Met 
435 440 445 

He Asn Asn Leu Leu Gin Asp Glu 
450 455 

(2) INFORMATION FOR SEQ ID NO: 21: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 524 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ II 

Met Glu Cys Val Gly Ala Arg Asn 

1 5 • 

Phe Pro Ser Trp Ser Cys Arg Arg 
20 

Ser Tyr Arg Asn He Arg Phe Gly 
35 40 

Gly Gly Ser Ser Gly Ser Glu Ser 
50 55 



I NO:21 : 

Phe Ala Ala Met Ala Val Ser Thr 
10 15 

Lys Phe Pro Val Val Lys Arg Tyr 
25 30 

Leu Cys Ser Val Arg Ala Ser Gly 
45 

Cys Val Ala Val Arg Glu Asp Phe 
60 
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Ala ;vsp Glu Glu Asp Phe Val Lys Ala Gly Gly Ser Giu lie Leu Phe 

70 .75 80 • 

M 

Val Gin Met Gin Gin Asn Lys Asp Met Asp Glu Gin Ser Lys Leu Val * 
85 90 95 

Asp Lys Leu Pro Pro lie Ser lie Gly Asp Gly Ala Leu Asp His Val 
100 105 110 

Val He Gly Cys Gly Pro Ala Gly Leu Ala Leu Ala Ala Glu Ser Ala 
115 120 125 

Lys Leu Gly Leu Lys Val Gly Leu He Gly Pro Asp Leu Pro Phe Thr 
130 135 140 

Asn Asn Tyr Gly Val Trp. Glu Asp Glu .Phe Asn Asp. Leu Gly Leu Gin • 

150 155 160 

Lys cys He Glu His Val Trp Arg Glu Thr He Val Tyr Leu Asp Asp 
165 170 175 

Asp Lys Pro He Thr He Gly Arg Ala Tyr Gly Arg Val Ser Arg Arg 
180 185 190 

Leu Leu His Glu Glu Leu Leu Arg Arg Cys Val Glu Ser Gly Val Ser 
195 200 205 

Tyr Leu Ser Ser Lys Val Asp Ser He Thr Glu Ala Ser Asp Gly Leu 
210 215 220 

Arg Leu Val Ala Cys Asp Asp Asn Asn Val He Pro Cys Arg Leu Ala 

230 235 240 

Thr Val Ala Ser Gly Ala Ala Ser Gly Lys Leu Leu Gin Tyr Glu Val 
245 250 255 

Gly Gly Pro Arg Val Cys Val Gin Thr Ala Tyr Gly Val Glu Val Glu 
260 265 270 

Val Glu Asn Ser Pro Tyr Asp Pro Asp Gin Met Val Phe Met Asp Tyr 
27S 280 285 

Arg Asp Tyr Thr Asn Glu Lys Val Arg Ser Leu Glu Ala Glu Tyr Pro 
290 295 300 

Thr Phe Leu Tyr Ala Met Pro Met Thr Lys Ser Arg Leu Phe Phe Glu 

310 315 320 

Glu Thr Cys Leu Ala Ser Lys Asp Val Met Pro Phe Asp Leu Leu Lys 
325 330 335 

Thr Lys Leu Met Leu Arg Leu Asp Thr Leu Gly He Arg He Leu Lys 
.. 340 345 35C 

Thr Tyr Glu Glu Glu Trp Ser Tyr He Pro Val Gly Gly Ser Leu Pro 
355 360 365 
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Asn Thr Glu Gin Lys Asn Leu Ala Phe Gly Ala Ala Ala Ser WJet Val 
370 375 380 

His Pro Ala Thr Gly Tyr Ser Val Val Arg Ser Leu Ser *Glu Ala Pro 
385 * 390 395 400 

Lys Tyr Ala Ser Val lie Ala Glu lie Leu Arg Glu Glu Thr Thr Lys 
405 410 415 



Gin lie Asn Ser Asn He 
420 

•* ' ■* 

Pro Glu Arg Lys Arg Gin 
43 5. 

He Val Gin Phe Asp Thr 
450 

Phe Arg Leu Pro Lys Trp 
465 470 

Thr Ser Gly Asp Leu Val 
485 

Pro Asn Asn Leu Arg Lys 
500 

Thr Gly Ala Thr Met He 
515 



Ser Arg .Gin Ala Trp 
425 

Arg Ala Phe Phe Leu 
440 

Glu Gly He Arg Ser 
455 

Met Trp Gin Gly Phe 
475 

Leu Phe Ala Leu Tyr 
490 

Gly Leu He Asn His 
505. 

Lys Thr Tyr Leu Lys 
520 



Asp Thr Leu Trp Pro 
430 

Phe Gly Leu Ala Leu 
445 

Phe Phe Arg Thr Phe 
.460 

Leu Gly Ser* Thr Leu 
480 

Met Phe Val He Ser 

495 

Leu He Ser Asp Pro 
510 

Val 
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1. 'An isolated eukaryotic enzyme having the amino' acid 
sequence of SEQ ID NO: 2, 4, 14/ 15, 16 or 18. 

2. An isolated eukaryotic enzyme of Claim 1 which is a e 
cyclase enzyme having the amino acid sequence of SEQ ID NO: 2, 

3. An isolated DNA sequence comprising a gene encoding 
the eukaryotic e cyclase of Claim 2-. 

A. The isolated -DNA -sequence according to Claim's; 
having the nucleic acid sequence of S.EQ ID NO: 1.. 

5. An expression vector comprising the DNA sequence of 
Claim 3 . • 

^ • • • • ■ ' 

6. The expression vector according to Claim 5 which is 
pATeps deposited with the American Type Culture Collection on 
March 4, 1996 uncier accession number 98005. 

7. A host containing the expression vector of Claim 5. 

8. A host containing the expression vector of Claim 6. 

9. An isolated eukaryotic enzyme of Claim l, which is an 
isopentenyl isomerase (IPP) enzyme having the amino acid 
sequence of SEQ ID NOS : 14, 15, 16 or 18. 

10. An isolated DNA sequence comprising a gene encoding 
the IPP enzyme of Claim 9. 

11. The isolated DNA sequence of Claim 10, having the 
nucleic acid sequence of SEQ ID NOS: 9, 10, 11 or 12. 

12. An expression vector comprising the DNA sequence of 
Claim 10. 
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13. The expression vector of Claim 11 which is pHP05, • 
pMDPl, pATDP7 or pHP04 , deposited with the American Type 
Culture Collection on March 4, 1996 under accession Nos. 
98000, 98001, 98002 or 98004. 

14. A host containing the expression vector of Claim 12. 

15. The isolated eukaryotic enzyme of Claim 1, which is 
/3-carotene hydroxylase enzyme having the amino acid sequence 
of SEQ ID NO: 4. 

16. An isolated DNA sequence comprising a gene encoding 
the /3-carotene hydroxylase enzyme of Claim 15. 

17* The isolated DNA sequence according to Claim 16, 
having the nucleic acid sequence of SEQ ID NO: 3. 

18. An expression vector comprising the DNA sequence of 
Claim 16. 

19. The expression vector according to Claim 18 which i! 
pATOHB deposited with the American Type Culture Collection on 
March 4, 1996 under accession number 98003. 

20. A host containing the expression vector of Claim 18 
21- A host containing the expression vector of Claim 19 
22. A DNA sequence which, when incorporated into a 

prokaryotic host, results in the expression of an eukaryotic 
carotenoid biosynthetic enzyme, 

wherein said DNA sequence comprises a truncated portion 
of the naturally occurring DNA sequence encoding said 
eukaryotic carotenoid biosynthetic enzyme, wherein said 



wo 97/36998 




PCT/US97/00540 



truncated portion comprises said natural sequence minus at 
least one codon at the 5' terminus. 

23. The DNA sequence of Claim 22, wherein said eukayotic 
carotenoid biosynthetic enzyme is /S-carotene hydroxylase. 

24. The DNA sequence of Claim 23, which is a Balll - 3' 
end exofragment of SEQ ID NO: 3 fuised to a 5' ATG start codon. 

25. A miethod for screening for eukaryotic genes . involved 
in carotenoid biosynthesis, metabolism or degradation 
comprising the steps of: 

engineering of a prokaryotic host which accumulates a 
carotenoid . or 'carotenoid precursor or which is deficient in an 
enzyme of the carotenoid pathway; 

transforming said host with DNA which may contain an 
eukaryotic carotenoid biosynthetic gene; 

culturing said transformed host to obtain colonies; and 

screening for colonies exhibiting a different visual 
appearance -than colonies of the untransf ormed host. 

26. The. method of Claim 25, wherein said prokaryotic host 
is E. coll . 

27. A method for producing a carotenoid, comprising the 
steps of: 

transforming a host with DNA which comprises a eukaryotic 
carotenoid biosynthetic gene; 

culturing said host for a time sufficient for said host 
to produce said carotenoid; and 

collecting said carotenoid from the host. 
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28 • The method of Claim 26, wherein said DNA further ^ * 
comprises a isopentyl pyrophospate isomerase gene. 

29. A method for inhibiting carotenoid biosynthesis in a^ 
host, comprising the steps of: 

transforming said ho6t with antisense DNA to a eukaryotic 
carotenoid biosynthesis gene; and 
culturing said host. 

30. A method for increasing production of a secondary 
metabolite of isopentyl pyrophosphate (IPP) by a host, 
comprising the steps of: 

transforming said host with DNA that comprises an 
isopentyl pyrophosphate isomerase gene; and 

culturing said host for a time sufficient to produce said 
secondary metabolite; and 

recovering said secondary metabolite from said host. 

31. The method of Claim 30, wherein said secondary 
metabolite is a carotenoid. 

32. A method for screening for secondary metabolites, 
comprising: 

engineering a host which accumulates a secondary 
metabolite or secondary metabolite precursor of isopentyl 
pyrophosphate (IPP) ; and 

transforming said host with DNA that may contain an IPP 
isomerase gene; and 

culturing said host for a time sufficient to accumulate 
said secondary metabolite or precursor; and 

screening for said secondary metabolite or precursor. 
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acaaaaggaaataattag 

gaacaatataacaatggtgtaagtcttctc 

atggagtgtgttggggctaggaatttcgca 
IMECVGARNFA 

agttgtcgaaggaaatttccagtggttaag 
21SCRRKFPVV K 

ttgtgtagtgtcagagctagcQQcggcgga 
41LCSVRAS6GG 

agagaagatttcgctgacgaagaagatttt 
61REDFADEEDF 

gttcaaatgcagcagaacaaagatatggat 
81 V Q. M 0 Q N K D M D 

cctatatcaattggtgatggtgctttggat 
101 P I S I G D G A L D 

ttagccttggctgcagaatcaqctaagctt 
121LALAAESAKL 

cttccttttactaacaattacggtgtttgg 
141 L • P F T N N Y G V W . 

aaatgtattgagcatgtttggagaqagact 
161KCIEHVWRET 

accattggccgtgcttatggaagagttagt 
181 T I G R A Y G R V S 

aggtgtgtcgagtcaggtgtctcgtacctt 
201RCVESGVSYL 

tctgatggccttagacttgttacttgtgac 
221 S D G L R L V A C D 

actgttgcttctggagcagcttcgggaaag 
241TVA5GAASGK 



attcctctttcigcttgctataccttgata • 48 

gctgtattcgaaattatttggaggaggaaa 108 

gcaatggcggtttcaacatttccqtcatgg 168 
AM. A V STEPS W 

agatacagctataggaatattcgtttcggt 228 
RYSYRNIRFG 

agttccggtagtgagagttgtgtagcggtg 288 
SSGSESCVAV 

gtgaaagctggtggttctgagattctattt 348 
VKAGGSEILF 

gaacagtctaagcttgttgataagttgcct 408 
EQ.SKLVDKLP 

catgtggttattggttgtggtcctgctggt 468 
HVVIGCGPAG 

ggattaaaagttggactcattggtccagat 528 
GLKVGLIGPD 

gaagatgaattcaatgatcttgggctgcaa 588 
EDEFNDLGLQ 

attgtgtatctggatgatgacaagcctatt 648 
IVYLDDDKPI 



cgacgtttgctccatgaggagcttttgagg 708 
RRLLHEELLR 

agctcgaaagttgacagcataacagaagct 768 
SSKVDSITEA 

gacaataacgtcattccctgcaggcttgcc 828 
DNNVIPCRLA 

ctcttgcaatacgaagttggtggacctaqa 888 
LlQYEVGGPR 
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gtctgtgtgcaaactgcatacggcgtggag gttgaggtggaaaat agtccatatgateca " 948 
261 V C V Q T A.Y G V E V E V E N S P Y D P ♦ 

gatcaaatggttttcatggattacagagat tatactaacgagaaagttcggagcttagaa 1008 
281DQMVFM DYRD YT NEKV RSLE' 

gctgagtatccaacgtttctgtacgccatg cctatgacaaagtcaagactcttcttcgag 1068 
301 A E Y P T F L Y A M P M T K S R L F F E 

• gagacatgtttggcctcaaaagatgtcatg ccctt'tgatttgctaaaaacgaagctcatg 1128 
321.E.TCL ASKDVM PF DLLKTKLM 

ttaagattagatacactcggaattcgaatt ctaaagacttacgaagaggagtggtcctat 1 188 
341 L R L D T L G I R I L K T Y E E E W S Y 

atcccagttggtggttccttgccaaacacc gaacaaaagaatctcgcctttggtgctg.ee 1248 
361 1 P V G G S L P N T E Q K N L A F G A A 

gctagcatggtacatcccgcaacaggctat tcagttgtgagatctttgtctgaagctcca 1308 
381ASMVHP ATGY SV..VR SLSEAP 

aaatatgcatcagtcatcgcagagatacta agagaagagactaccaaacagatcaacagt 1368 
401KYASVrAEIL RE ETTKQINS 

aatatttcaagacaagcttgggatacttta tggccaccagaaaaaaaaagacagagagca 1428 
421N1SRQAWDTL WPPERKRQRA 

aatatttcaagacaagcttgggatacttta tggccaccagaaaggaaaagacagagagca 1488 
441 F F L F G L A L I V Q F D T E G I R S F 

ttccgtactttcttccgccttccaaaatgg atgtggcaagggtttctaggatcaacatta 1548 
461FRTFFRLPKW MWQGFLGSTL 

acatcaggagatctcgttctctttgcttta tacatgttcgtcatttcaccaaacaatttg 1608 
481 TSGDLVLFAL YMFVISPNNL 

agaaaaggtctcatcaatcatctcatctct gatccaaccggagcaaccatgataaaaacc 1668 
501R KGLINHLIS DPTGATM IKT 

tatctcaaagtatgatttacttatcaactc ttaggtttgtgtatatatatgttgatttat 1728 
521 Y L K V 
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ctg^^l^tcgatcaaagaatggtatgtgg gttactaggaagttggaaacaaacatgtat 1778 
agaatctaaggagtgatcgaaatggagatg gaaacgaaaagaaaaaaatcagtctttgtt 1848 
ttgtggttagtg . I860 

FIG.4C 
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♦ » • 

1 gctctttctc ctcctcctct accgatttcc gactccgcct cccgaaatcc 

♦ 

51 ttatccggat tctctccgtc tcttcgattt aaacgctttt ctgtctgtta 
101 cgtcgtcgaa gaacggagac agaattctcc gattgagaac gatgagagac 
151 cggagagcac gagctccaca aacgctatag acgctgagta tctggcgttg ' 
201 cgtttggcgg agaaattgga gaggaagaaa tcggagaggt ccacttatct 
251 aatcgctgct atgttgtcga gctttggtat cacttctatg gctgttatgg 
301 ctgtttacta cagat.tctct tggcaaatgg agggaggtga gatctcaaig 
351 ttggaaatgt ttggtacatt tgctctctct gttggtg.ctg ctgttggtat 
401 ggaattctgg gcaagatggg ctcatagagc tctgtggcac gcttctctat 
451 ggaatatgca tgagtcacat cacaaaccaa gagaaggacc gtttgagcta 
501 aacgatgttt ttgctatagt gaacgctggt ccagcgattg gtctcctctc 
551 ttatggattc ttcaataaag gactcgttcc tggtctctgc tttggcgccg 
601 ggttaggcat aacggtgttt ggaatcgcct acatgtttgt ccacgatggt 
651 ctcgtgcaca agcgtttccc tgtaggtccc atcgccgacg tcccttacct 
701 ccgaaaggtc gccgccgctc accagctaca tcacacagac aagttcaatg 
751 gtgtaccata tggactgttt cttggaccca aggaattgga agaagttgga 
801 ggaaatgaag agttagataa ggagattagt cggagaatca aatcatacaa 
851 aaaggcctcg ggctccgggt cgagttcgag ttcttgactt taaacaagtt 
901 ttaaatccca aattcttttt ttgtcttctg tcattatgat catcttaaga 
951 cggtct 

FIG. 5 
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Box t Observations where centiin claims were found unsearchable (Cootinualioo of item J of first sheet) 



This international report has not been established in respect of certain claims under Article 17j'2)(a) for the following reasons:^ 

1. j ' I Claims Nos.: 

because they relate to subject matter not required to be searched by this Authority, namelv; 



□ 



Claims Nos.: 

because Lhcy relate to parts of the international application that do not comply with the prescribed requirements to such 
an Extent that no meaningful international search can be carried out. specifically: 



3. Q CUims Nos.: 

bcscauac ihcy are dependent claims and are not drafted in accordance with the second arKi third sentences of Rule 6.4<a). 
I Box II O bacrvations where unity of ioTcndon is lacking (CoDtiDualion of kem 2 of first sheci) 
This Intemaiional Searching Authority found multiple inventions in this international application, as follows: 
Please See Extra Sheet. 



^ ■ CHI ^* *^ '^""'^ additional search fees were timely paid by the applicant, this international search report covers all searchable 



claims 



searchable claims could be searched without effort justifying an additional fee, this Authority did not invite payment 
of any additional fee. 

As only some of the rcquuxd additional search fees were timely paid by the applicant, this international search report covers 
' only those claims for which fees were paid, specifically claims Nos.: 



I 4. Q No required addhional search fees were timely paid by the applicant. Consequently, this intcmaUonal search report is 
restricted to the invention first mentioned in the claims; it is coveted by claims Nos.: 



Remark on Protest The additional search fees were accompattied by the applicant's protest. 

1 X| No protest accompanied the payment of additional search fees. 
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A CLASSIFICATION OF SUBJECT MATTER: 

IPC (6): ♦ 
C1CN'i/21. 5/10, 9/02. 9/10, 9/90. 15/53, 15/54. 15/61. 15/63; C12P 23/00: CI 2Q 1/68 

A. CLASSIFICATION OF SUBJECT MATTER: 

US CL : 

435/6, 67, 189, 193 , 233 . 252.3 , 254.11, 320,1, 325 . 419; 536/23.2 

B. FIELDS SEARCHED 

Electronic dau bases consulted (Name of dau base and where practicable tcrtns used): 
Dialog. APS 

search terms: IPP. isopcnicnyl pyrophosphate isomcrasc, cpsilon cyclase, isopcntcnyi diphosphate isomerasc. carotene 
hydroxylase, carotcnoid, synthesis, biosynthesis, Axabidopsis thaliana, Haemaiococcus pluvialis 

BOX n. OBSERVATIONS WHERE UNITY OF INVENTION WAS LACKING 
This ISA found multiple inventions as follows: 

This application contains ihc following inventions or groups of inventions which are not so linked as to form a single 
inventive concept under PCT Rule 13.1. 

Group I. claims 2-8, drawn to epsilon cyclase enzyme, ON A encoding cpsilon cyclase, vectors and host cells 
comprising said DNA. * - 

Group U, claims 9-14. drawn to isopcntenyl pyrophosphate (IPP) isomerase enzymes, DNA encoding IPP isomcrase. 
vectors and host cells comprising said DNA. 

Group III. claims 15-24, drawn to beta carotene hydroxylase enzyme, DNA encoding beta carotene hydroxylase, 
vectors and host cells comprising said DNA. 

Group IV, claims 25, 26, and 32, drawn to methods of screening using DNA comprising carotenoid biosynthesis genes. 
Group V, claims 27, 28, 30, and 31, drawn to methods of using DNA encoding IPP isomerasc. 
Group VI, claim 29, drawn to a method of using antisense DNA. 



Claim 1 i^ generic to Groups 1. 11, and III and v^riU be examined with the elected Group(s) to the extent it reads thereon. 

The inventions listed as Croups 1-VI do not relate to a single inventive concept under PCT Rule 13.1 because; under 
PCT Rule 13.2, they lack ti»e same or corresponding special technical features for the following reasons: The claims of 
Group I share a technical feature of cpsilon cyclase; the claims of Group 11 share a technical feature of IPP isomerase; 
the claims of Group III share a technical feature of beta carotene hydroxylase; the claims of Group IV share a technical 
feature of a screening method; the claims of Group V share a technical feature of methods of using DNA encoding IPP 
isomerase; and the claim of Group VI has a technical feature of antisense DNA. Carotenoid biosynthetic enzymes and 
genea were known in the art. Sec the references cited on page 3 of the disclosure; see also Spurgeon et al. (Arch. 
Biochem. Biophys. 230(2) : 446-454 (1984); IPP isomerase). Hence, the various Groups of invenuons do not share a 
technical relationship involving one or more of the same or corresponding special technical features, i.e., those 
technical features that define a contribution which each invention, considered as a whole, makes over the prior art. 
They therefore do not fulfill the rcquiremenu of unity of invention and a holding of lack of unity for examination 
purposes is proper. Accordingly, the claims are not so linked by a special technical feature within the meaning of PCT 
Rule 13.2 so as to forni a single inventive concept. 
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